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ABSTRACT 


Evaluating  a  Novel  Eye-Tracking  Tool  to  Detect  Invalid  Responding  in  Neurocognitive 
Assessment 

David  M.  Barry,  Ph.D.,  2015 

Thesis  directed  by:  Mark  Ettenhofer,  Ph.D.,  Assistant  Professor,  Department  of  Medical 
and  Clinical  Psychology 

INTRODUCTION:  Valid  symptom  report  and  test  performance  are  essential 
prerequisites  for  the  accurate  interpretation  of  neurocognitive  or  neuropsychological 
assessment  data.  Unfortunately,  base  rates  of  invalid  responding  in  civilian  and  military 
contexts  suggest  that  symptom  exaggeration  and  underperformance  are  common  in  these 
types  of  assessments.  Many  response  validity  tests  (RVTs)  have  been  developed  and 
derived  to  detect  invalid  responding,  but  these  measures  are  limited  by  lengthy 
administration  times,  limited  sensitivity,  and  susceptibility  to  coaching..  This  dissertation 
project  evaluated  a  novel  eye-tracking  tool,  the  Bethesda  Eye  &  Attention  Measure 
(BEAM),  as  a  method  for  detecting  invalid  responding  in  neurocognitive  assessment. 
METHODS:  A  prospective,  simulator  study  compared  neurocognitive  battery 
perfonnance  between  two  group  of  healthy  adults:  an  unbiased  group  (n=26)  instructed  to 
perform  their  best  and  a  biased  group  (n=24)  instructed  to  simulate  deficits  associated 
with  head  injury.  The  biased  group  was  given  a  warning  to  fake  believably.  Results 
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from  the  simulator  study  were  cross-validated  in  a  clinical  sample  of  unbiased  responders 
with  a  history  of  mild  TBI  (n=19). 

RESULTS:  Of  the  29  BEAM  metrics  evaluated  in  the  simulator  study,  12  demonstrated 
outstanding  classification  accuracy  (AUC  >  .90).  Overall  Saccadic  Reaction  Time  Intra- 
Individual  Variability  (AUC  =  .97)  and  Overall  Manual  Reaction  Time  Intra-Individual 
Variability  (AUC  =  .97)  demonstrated  the  best  classification  accuracy  among  the  BEAM 
variables.  The  BEAM  perfonned  favorably  when  compared  to  well-validated  embedded 
and  freestanding  response  validity  tests — including  the  CPT-II,  WAIS-IV  Digit  Span, 
Trail  Making  Test  A  &  B,  MSVT,  and  VSVT.  Several  BEAM  metrics  identified  in  the 
simulator  study  demonstrated  outstanding  classification  accuracy  in  the  clinical  sample. 
DISCUSSION:  The  BEAM  demonstrated  considerable  promise  as  a  tool  to  detect 
invalid  responding  in  neurocognitive  assessment.  Consistent  with  the  literature  on 
continuous  performance  tests,  BEAM  reaction  time  intra-individual  variability, 
omissions,  and  commissions  demonstrated  the  best  classification  of  invalid  responding 
behavior  in  both  experimental  and  clinical  samples.  This  study  adds  to  the  extant 
response  validity  literature  by  demonstrating  that  saccadic  performance  in  a  continuous 
performance  test  may  be  used  to  detect  invalid  responding.  Results  from  the  simulator 
study  were  cross-validated  in  a  clinically-relevant  population,  providing  preliminary 
evidence  supporting  the  BEAM’S  clinical  utility  as  a  response  validity  test. 

Additional  research  should  evaluate  the  BEAM’S  ability  to  identify  invalid  responding  in 
larger,  more  heterogeneous  groups  of  persons  with  and  without  neurological  conditions. 
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CHAPTER  1:  Introduction 


Introduction  and  Scope  of  Problem 

Invalid  responding,  symptom  exaggeration,  malingering,  faking  behaviors,  and 

other  forms  of  inaccurate  representations  of  abilities  and  conditions  have  transcended 
scientific  and  clinical  disciplines  for  millennia.  In  their  excellent  piece  on  the  history  of 
deliberately  misrepresented  illnesses,  Carone  and  Bush  (44)  provide  several  examples  of 
this  phenomenon  throughout  recorded  history:  David  from  the  Hebrew  Bible  acting 
erratically  and  drooling  on  his  beard  to  escape  persecution,  Odysseus  pretending  to  be 
insane  to  avoid  fighting  in  the  Trojan  War,  and  notes  from  a  2nd  century  Greco-Roman 
physician  documenting  the  simulation  of  pain  and  injury  to  avoid  responsibilities.  From 
the  ancient  ages  to  the  present,  humans  have  presented  with  invalid  physical  and 
psychiatric  problems,  often  times  doing  so  to  achieve  a  desired  result.  The  implications 
of  such  behaviors  are  not  trivial;  misrepresented  physical  and  psychiatric  conditions  can 
influence  medical  treatments,  return-to-duty  evaluations,  social  expectations,  and  other 
clinical  outcomes  across  a  number  of  settings. 

Since  the  early  1990s,  the  scientific  field  of  neuropsychology  has  taken  an 
increasingly  active  role  towards  understanding  feigned,  exaggerated,  or  otherwise  invalid 
symptom  and  ability  presentation.  Neuropsychology  is  a  clinical  and  experimental 
branch  of  psychology  that  studies  the  structure  and  function  of  the  brain  in  relation  to 
behaviors,  emotions,  cognitions,  physical  capacities,  and  symptom  presentations  (116). 

In  turn,  neuropsychologists  aim  to  study,  assess,  and  treat  behaviors  directly  related  to 
brain  functioning  by  administering  comprehensive  evaluations  on  human  subjects  of 
interest.  Once  the  assessments  are  scored,  neuropsychologists  can  form  diagnostic 
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impressions  and  clinical  inferences  regarding  a  person’s  physical,  behavioral,  emotional, 
and  cognitive  functioning  and  prognosis.  In  essence,  neuropsychological  assessment  is  a 
quantitative,  standardized  means  of  measuring  complex  aspects  of  human  behavior  and 
cognition,  such  as  attention,  memory,  visuospatial  and  perceptual  skills,  language, 
reasoning,  planning,  and  emotional  processing  (163). 

A  central  principle  of  neuropsychological  or  “neurocognitive”  assessment 
predicates  a  relationship  between  performance  on  neuropsychological  tests  and  the  actual 
condition  of  the  brain  (164).  As  such,  neuropsychological  assessment  depends  on  the 
examinee’s  full  effort  and  accurate  symptom  report  for  the  results  to  be  valid  (154).  If 
the  results  of  a  given  neuropsychological  assessment  tool  are  invalid,  the  results  cannot 
by  definition  be  related  to  brain  function.  Knowing  whether  test  data  is  valid  is  essential 
for  drawing  conclusions,  making  diagnoses,  and  recommending  treatments  (163).  As 
such,  it  is  imperative  for  examiners  to  consider  both  the  psychometric  properties  of  the 
measures  used  in  neuropsychological  assessment  (e.g.,  internal  factors;  87)  as  well  as 
factors  which  may  influence  perfonnance,  such  as  environmental  effects  (181)  or  rapport 
with  examiner  (104). 

This  manuscript  focuses  on  a  particular  factor  that  influences  test  outcome — 
response  validity,  or  the  validity  of  one’s  perfonnance  and  symptom  presentation  during 
neurocognitive  assessment.  Neurocognitive  assessment  depends  on  its  examinees  to 
provide  accurate  symptom  report  and  adequate  level  of  effort  to  perform  well  throughout 
the  testing  process  (132;  154).  Knowingly  or  unknowingly,  however,  some  persons 
undergoing  neurocognitive  assessment  may  give  misleading  responses  or  perfonn  at 
levels  other  than  their  actual  neurocognitive  status  (163).  An  individual’s  response 
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validity  can  be  impacted  by  multiple  patient  factors,  such  as  pursuit  of  secondary  gain, 
fatigue,  stress,  medical  conditions,  psychiatric  conditions1,  and  medications,  as  well  as 
external  factors,  such  as  testing  environment  (e.g.,  limited  space,  excessive  ambient 
noise),  examiner  skill,  unclear  assessment  instructions,  and  language/cultural  barriers  (11; 
181). 

Invalid  presentations  on  neurocognitive  assessment  are  not  entirely  attributable  to 
neurological  conditions,  are  not  significantly  influenced  by  demographic  variables  or 
perfonnance  confounds  (e.g.,  fatigue,  medication),  and  are  significantly  worse  than 
expected  scores  for  persons  with  genuine  brain  disorders  (116).  Beyond  invalidating  test 
results,  invalid  responding  behavior  can  lead  to  significant  individual,  economic,  and 
societal  consequences,  such  as  undetected  neurological  problems,  improperly  awarded 
financial  settlements,  and  overly  compensated  disability  claims  (215).  Unfortunately, 
symptom  exaggeration  and  insufficient  effort  to  perform  well  are  frequently  identified  in 
neuropsychological  evaluations,  especially  in  forensic  settings  (34;  41;  154). 

Invalid  responding  behavior  can  manifest  on  perfonnance  tests  of  abilities  (e.g., 
cognitive  abilities,  motor  abilities)  and  self-report  measures  (e.g.,  symptom  presentation 
measures).  Unfortunately,  subjective  methods  for  detecting  response  bias,  such  as 
clinical  judgment  (i.e.,  clinical  intuition),  pattern  analysis,  and  discrepancy  methods,  are 
insufficient  and  influenced  by  cognitive  biases  (106;  1 15).  To  more  accurately  detect 
invalid  performance  and  exaggerated  symptom  presentation,  clinicians  utilize  a  variety  of 
stand-alone  validity  tests  and  validity  indices  embedded  within  assessment  measures. 
When  persons  surpass  validity  thresholds  on  these  measures  or  indices,  clinicians  have 
objective  evidence  to  suggest  the  presence  of  invalid  data. 

1  Somatoform  or  cogniform  disorders 
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Largely  in  part  to  a  surge  of  interest  and  research  in  response  validity  in  the  1990s 
(254),  researchers  were  able  to  estimate  base  rates  of  deliberate  misrepresentations  of 
abilities  or  symptoms  in  a  host  of  civil  and  criminal  settings  (154).  Several  decades  of 
research  suggest  that  symptom  exaggeration  and  response  bias  occur  on  30-50%  of 
neuropsychological  evaluations  with  potential  for  secondary  gain  (158;  183).  This  high 
rate  of  suboptimal  performance  is  even  more  disturbing  when  considering  that  one’s 
effort  to  perfonn  optimally  on  neurocognitive  tests  accounts  for  more  variance  than  brain 
injury  severity  (82;  95;  99;  150;  178).  In  the  past  decade,  the  National  Academy  of 
Neuropsychology  (NAN)  and  the  American  Academy  of  Clinical  Neuropsychology 
(AACN)  recommended  response  validity  assessment  be  included  in  all  evaluations  of 
brain  and  behavior  (41;  116).  The  NAN  paper  added  that  effort  and  symptom  validity 
assessment  could  be  conducted  with  specially-designed  tests,  indices,  and  observations  of 
effort,  as  well  as  other  non-specific  metrics  (41). 

Researchers  have  been  evaluating  methods  of  response  bias  detection  for  decades 
in  order  to  adapt  to  the  ever-evolving  patterns  and  presentations  of  response  bias  (2 1 8). 

A  common  research  design  used  to  study  invalid  responding  is  the  “simulator  study,” 
where  groups  of  participants  instructed  to  perform  their  best  are  compared  to  groups  of 
participants  instructed  to  simulate  deficits  (248;  272).  Simulation  studies,  also  known  as 
“analog  research  on  dissimulation,”  allow  for  an  experimental  design  where  subjects  can 
be  randomly  assigned  to  dissimulating  or  nondissimulating  conditions  (154).  Since 
neurocognitive  assessment  often  deals  with  neurological  injury  or  illness,  simulation 
study  designs  frequently  utilize  actual  and  simulated  traumatic  brain  injury  (TBI)  groups 
for  comparisons  to  healthy  controls  (23;  249-251;  279;  282).  TBI  assessment, 
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particularly  with  mild  TBI,  often  relies  on  self-report  and  is  thus  vulnerable  to  response 
bias  (242;  266;  286).  This  diagnostic  limitation  makes  TBI  groups  an  attractive 
population  for  studying  feigned  cognitive  deficits. 

Computerized,  continuous  performance  tests  have  been  used  to  study  attention, 
executive  functioning,  and  infonnation  processing  in  TBI  populations  (33;  54). 
Continuous  perfonnance  tests  have  recently  been  evaluated  as  performance  validity 
measures,  with  metrics  such  as  button  press  reaction  time,  button  press  reaction  time 
variability,  omission  errors,  and  commission  errors  showing  acceptable  sensitivity 
towards  invalid  responding  (42;  1 18;  1 19;  149;  198).  Oculomotor  metrics  and  eye 
tracking  methodologies  have  also  been  identified  as  promising  tools  for  detecting  invalid 
responding  on  neurocognitive  measures  (111). 

This  dissertation  project  investigated  a  novel  invalid  responding  detection  method 
that  combines  continuous  performance  tests  with  oculomotor  functioning.  To  provide 
sufficient  context  for  this  study,  the  following  literature  review  will  present  the  problems 
associated  with  invalid  symptom  and  ability  presentation,  describe  the  terminology  used 
throughout  the  literature,  and  address  the  strengths  and  limitations  of  the  current  methods 
used  to  detect  invalid  responding.  Next,  the  literature  review  will  describe  the  potential 
for  reaction  time  and  oculomotor  assessment  to  detect  invalid  responding.  Lastly,  a 
novel,  multimodal  neurocognitive  assessment  tool — the  Bethesda  Eye  &  Attention 
Measure  (BEAM)  will  be  described. 

Traumatic  Brain  Injury  and  Validity  Concerns 

Traumatic  brain  injury  (TBI)  presents  a  significant  public  health  and  economic 

concern  in  the  United  States  and  the  world.  In  2009,  there  were  approximately  3.5 
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million  TBIs  recorded  in  U.S.  medical  settings  (56).  While  the  vast  majority  (~70-90%) 
of  treated  TBIs  in  the  U.S.  are  classified  as  “mild”  (i.e.,  concussion;  22;  45;  47),  the 
overall  impact  of  TBI  is  anything  but  mild.  According  to  population-based  data  obtained 
between  2002-2006,  TBI  contributes  to  nearly  one -third  of  all  injury-related  deaths  in  the 
United  States  (76).  TBI  is  associated  with  cognitive,  psychological,  physical,  and 
behavioral  sequelae  which  often  lead  to  disability  (131;  237).  The  total  lifetime  costs  of 
all  the  fatal,  hospitalized,  and  nonhospitalized  cases  of  TBI  that  were  medically  treated  in 
2000  were  estimated  to  be  $60.4  billion  (including  productivity  losses  of  $51.2  billion), 
with  per-person  lifetime  costs  approaching  $45,000  (57).  More  recently,  McCrea  (171) 
estimated  TBI  to  have  a  $100  billion  annual  impact  on  the  U.S.  economy  in  terms  of 
medical  costs  and  lost  productivity. 

Military-related  TBI  has  dramatically  added  to  the  U.S.  economic  cost  in  the  past 
decade,  primarily  due  to  the  wars  in  Iraq  and  Afghanistan.  It  has  been  estimated  that 
approximately  15-20%  of  all  U.S.  Service  Members  who  deploy  to  Iraq  or  Afghanistan 
sustain  at  least  one  mild  TBI  (123;  258;  260).  Using  a  standard  cost-of-illness  approach, 
the  RAND  Corporation  (258)  estimates  the  average  cost  in  2005  dollars  of  a  deployment- 
related  TBI  to  the  U.S.  economy  ranges  from  $148,573  to  $222,000  per  TBI.  The  total 
U.S.  economic  cost  of  deployment-related  TBI  from  2001-2005  is  estimated  to  be 
between  $90,629,389  to  $135,419,773  (258).  The  median  annual  cost  for  TBI-diagnosed 
OIF/OEF  Veterans  was  nearly  four  times  higher  than  OIF/OEF  veterans  without  a  history 
of  TBI  (259).  From  January  1,  2000  through  the  fourth  quarter  of  2013,  there  have  been 
294,172  documented  cases  of  all-severity  TBI  among  Department  of  Defense  Service 
Members  (62).  Of  those,  242,676  (82.5%)  were  classified  as  mild  TBI,  23,754  (8.1%)  as 
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moderate  TBI,  4,389  (1.5%)  as  penetrating,  2,920  (1.0%)  as  severe  TBI,  and  20,433 
(6.9%)  were  not  classifiable  (62). 

A  recent  meta-analysis  of  epidemiological  studies  of  TBI  in  developed  countries 
estimates  lifetime  prevalence  of  at  least  one  TBI  with  loss  of  consciousness  (LOC)  in  the 
general  population  to  be  12%,  with  16.7%  lifetime  prevalence  for  males  and  8.5%  for 
females  (86).  Each  year  in  the  U.S.,  more  than  a  million  people  receive  emergency  room 
treatment  for  TBI,  with  235,000  eventually  being  hospitalized  and  50,000  dying  from 
their  head  injuries(152).  It  is  likely  that  the  true  incidence  and  prevalence  of  TBI-related 
disability  is  higher  than  these  estimates,  as  the  numbers  do  not  incorporate  unreported 
TBI  or  TBI  treated  outside  of  civilian  hospitals  (57).  Recently,  a  population-based 
incidence  study  of  TBI  in  New  Zealand  incorporated  registered  and  nonregistered  cases 
of  TBI  in  their  estimates,  and  the  authors  reported  mild  TBI  incidence  of  749  (95%  CI\ 
709-790)  per  100,000  person-years,  a  much  higher  incidence  estimate  than  previously 
reported  in  studies  from  other  modern  countries  (77). 

With  advances  in  modern  medicine  and  neuroimaging,  more  civilians  and  service 
members  are  surviving  TBI.  As  a  result  of  reduced  mortality  rates,  an  ever-increasing 
number  of  people  are  living  with  major  functional  and  cognitive  disabilities  (171). 
Between  3.17  and  5.3  million  U.S.  citizens  (roughly  10%  of  all  disabled  Americans)  are 
estimated  to  be  living  with  pennanent  TBI-related  disability  (152;  262;  285).  An 
estimated  43.3%  of  Americans  have  residual  disability  one  year  following  TBI-related 
hospitalization  (237). 

Traumatic  brain  injury,  like  other  disabling  injuries  or  illnesses,  has  a  profound 
impact  on  the  U.S.  economy.  Unfortunately,  TBI  and  other  conditions  that  often  rely 
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primarily  on  patient  self-report  for  diagnosis  are  especially  vulnerable  to  patient 
misrepresentation,  provider  misclassification,  and  improper  disability  compensation  (242; 
266;  286).  While  this  diagnostic  dilemma  may  prove  difficult  in  many  clinical  and 
research  settings,  it  makes  TBI — and  mild  TBI  in  particular — an  ideal  population  for 
studying  feigned  deficits  (100). 

Costs  of  Undetected  Symptom  Exaggeration  and  Invalid  Responding 

Symptom  exaggeration  and  feigned  disability  can  significantly  impact  the  U.S. 

economy  if  undetected.  It  is  estimated  that  30-50%  of  all  disability  compensation 
evaluations  involve  some  form  of  symptom  exaggeration  and/or  invalid  responding  (158; 
183),  and  that  the  total  annual  cost  of  insurance  fraud  to  the  U.S.  economy  approximates 
$85.3  billion  (166).  In  2008  alone,  the  Social  Security  Administration  (SSA)  and  other 
governmental  programs  spent  approximately  $428.5  billion  in  payments  to  working-age 
persons  who  met  disability  criteria  (a  sizable  increase  from  the  $280  billion  spent  in 
2002;  51).  Of  that  $428.5  billion,  Chafetz  (51)  estimates  that  $42.85  to  $180  billion  (i.e., 
10-42%  of  total  expenditures)  were  spent  on  claimants  with  possible,  probable,  or 
definite  misrepresentations  of  disability. 

Invalid  representation  of  abilities  or  symptoms  also  poses  a  significant  problem 
for  the  Department  of  Veterans’  Affairs  (VA),  the  agency  that  oversees  the  Veterans’ 
Disability  Compensation  (VDC)  program.  In  fiscal  year  2004,  nearly  2.5  million 
Veterans  (10.2%  of  the  total  U.S.  Veteran  population)  received  disability  compensation, 
with  the  average  annual  disability  compensation  payment  being  $8,378  (143).  Applying 
Chafetz’s  (49;  51)  base  rates  of  symptom  exaggeration  and  underperformance,  between 
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$2.1  billion  and  $8.8  billion  was  likely  spent  in  fiscal  year  2004  on  VA  disability 
compensations  involving  exaggerated  symptoms  or  underrepresented  abilities. 

It  is  clear  that  symptom  exaggeration  and  ability  misrepresentation  in  disability 
compensation  contexts  constitute  major  problems  on  a  national  economic  level.  Federal 
and  state  programs  designed  to  support  disabled  civilians  and  Veterans  may  be 
improperly  compensating  individuals  by  tens  of  billions  of  dollars  each  year.  As  budgets 
at  all  levels  of  government  are  being  scrutinized  for  waste,  fraud,  and  abuse,  it  is  clear 
that  greater  emphasis  is  needed  towards  detecting  invalid  responding  in  disability 
contexts.  Undetected  symptom  exaggeration  and  invalid  responding  creates  a  significant 
financial  and  societal  burden.  Alternatively,  implementing  effective,  evidence-based 
methods  to  detect  invalid  responding  in  federal  disability  programs  could  reduce  this 
burden  and  save  millions,  perhaps  billions,  of  dollars  each  year.  Given  the  potential  cost- 
savings,  additional  research  on  invalid  responding  behavior  and  detection  is  needed. 

Relevant  Terminology  Used  in  the  Extant  Literature 
Describing  Invalid  Responding  Behavior 

Proper  research  on  response  validity  and  its  assessment  first  requires  a  method  to 

operationalize  constructs  (28).  Unfortunately,  a  plethora  of  loosely  defined  words,  terms, 
and  definitions  have  been  used  to  describe  similar  constructs  throughout  the  literature  on 
response  validity.  One  of  the  most  common  (and  controversial)  terms  used  in  the 
literature  is  malingering,  which  was  derived  from  the  French  word  malingre,  meaning 
“sickly”  (174).  Another  popular  term,  effort,  has  been  utilized  to  indicate  the  amount  of 
mental  and/or  physical  energy  expended  in  performing  a  task  at  capacity  levels  (41;  99; 
246).  Negative  response  bias  and  dissimulation  have  been  also  been  used  to  describe  the 
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misrepresentation  of  abilities,  either  from  over-representing  or  under-representing  a  true 
set  of  symptoms  during  evaluation  (41;  116). 

Not  surprisingly,  several  terminology  “camps”  have  emerged.  While  some 
researchers  staunchly  defend  use  of  the  term  malingering  or  some  derivation  thereof 
(244),  others  advocate  using  alternate  terms  such  as  suspect  effort  (14)  or  incomplete 
effort  (12).  Boone  (34)  prefers  terminology  that  describes  invalid  responding  behavior 
irrespective  of  intent,  such  as  noncredible  performance/symptoms,  negative  response 
bias,  or  non-physiological,  suspect,  or  suboptimal  effort.  This  nomenclature  dilemma 
stems  from  an  important  debate  involving  the  ever-evolving  construct  of  malingering,  the 
intentionality  of  suboptimal  perfonnance  and  presentation,  and  what  the  definition  of 
“effort”  actually  means  from  a  biopsychosocial  perspective  (29;  30;  156;  246). 

As  a  result  of  the  terminology  debates,  the  list  of  terms  associated  with  response 
validity  in  the  context  of  symptom  presentation,  effort,  and  performance  has  grown  into  a 
veritable  thesaurus.  In  addition  to  the  aforementioned  terms,  it  is  not  uncommon  to  find 
feigned  cognitive  impairment,  nonorganic  signs  and  symptoms,  insufficient  effort,  invalid 
effort,  invalid/failed  performance,  cognitive  malingering,  faking  bad,  symptom 
amplification,  performance  exaggeration,  underperformance/distortion,  symptom 
embellishment,  disingenuous,  or  faked  in  a  neurocognitive  evaluation  report  (28;  30;  34). 
Bigler  (30)  argues  that  scientists  have  created  a  tautological  problem  of  unnecessary  and 
repetitive  use  of  different  words,  terms,  and  acronyms  with  similar  meanings,  creating 
communication  barriers  for  researchers  and  clinicians  across  scientific  disciplines. 

To  adequately  review  the  literature  of  symptom  exaggeration  and  response 
validity  in  neurocognitive  test  perfonnance,  it  is  necessary  to  first  provide  a  review  of  the 
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terminology  used  to  describe  it.  The  following  section  will  present  relevant  constructs 
and  their  definitions,  including  malingering,  effort,  and  other  terms  used  to  describe 
validity  assessment  measures.  The  intent  of  this  section  is  to  clarify  the  similarities  and 
differences  among  common  terms  in  the  response  validity  literature,  and  to  arrive  upon  a 
consistent  terminology  and  construct  operationalization  for  use  in  this  manuscript. 

Malingering:  A  Judgmental  Description 

Merriam-Webster  (174)  defines  malingering  as  the  pretending  or  exaggeration  of 

incapacity  or  illness  as  to  avoid  duty  or  work.  In  the  U.S.  military,  malingering  is  a 
punishable  offense  under  Article  1 15  of  the  Unifonned  Code  of  Military  Justice  (UCMJ; 
206).  According  to  UCMJ  (211,  np),  malingering  is  done  for  the  “purpose  of  avoiding 
work,  duty,  or  service,”  and  manifests  as  the  (1)  intentional  infliction  of  self-injury,  or  (2) 
feigning  of  “illness,  physical  disablement,  mental  lapse,  or  derangement.” 

The  American  Psychiatric  Association  (APA)  listed  the  tenn  malingering  in  the 
original  Diagnostic  and  Statistical  Manual  of  Mental  Disorders  (DSM;  2)  as  a 
supplemental  tenn  in  an  appendix  without  specific  criteria.  In  DSM-II  (3),  malingering 
was  described  as  a  conscious  behavior  that  needed  to  be  distinguished  from  Hysterical 
Neurosis,  Conversion  Type.  In  1980,  DSM-III  listed  malingering  as  a  “V”  code — a 
condition  not  considered  to  be  a  mental  disorder  per  se,  but  still  worthy  of  clinical 
attention.  The  original  DSM-III  malingering  criteria  were  the  presence  of  false  or 
exaggerated  physical  or  psychological  symptoms,  voluntarily  produced  in  the  pursuit  of 
an  obvious,  recognizable  goal.  Minor  changes  were  made  to  the  malingering  criteria  in 
DSM-III-R  (5),  DSM-IV  (6)  and  DSM-IV-TR  (7),  and  the  criteria  appear  to  remain  the 
same  for  DSM-5  (26).  DSM-IV-TR  (7)  indicates  that  malingering  may  represent  an 
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adaptive  behavior  and  recommends  strong  consideration  of  malingering  if  one  or  more  of 
the  following  is  present:  (1)  medicolegal  context,  (2)  marked  discrepancy  between 
objective  findings  and  a  person’s  symptom  report,  (3)  poor  treatment  compliance  or 
rapport  with  provider,  and  (4)  the  presence  of  antisocial  personality  disorder.  Of  note, 
the  DSM-IV-TR  (7)  states  that  malingering  must  be  distinguished  from  both  factitious 
disorder  (i.e.,  conscious  symptom  generation  to  fulfill  the  “sick  role”)  and  somatoform 
disorder  (i.e.,  unconscious  symptom  generation). 

While  the  DSM’s  definition  of  malingering  has  changed  somewhat  over  the  past 
several  editions,  ample  evidence  suggests  its  diagnostic  criteria  are  clinically  and 
practically  untenable  for  use  in  clinical  practice  and  research  (26).  To  provide  a  more 
reliable  framework  for  malingering  in  clinical  and  research  settings,  particularly  in  the 
field  of  forensic  neuropsychology,  Slick,  Sherman,  and  Iverson  (244)  introduced 
differential  diagnostic  criteria  for  Possible,  Probable,  and  Definite  Malingered 
Neurocognitive  Dysfunction  (MND).  This  set  of  criteria  has  since  become  the  most 
commonly  used  diagnostic  standard  in  neuropsychological  research  (246).  MND  is 
defined  as  “the  volitional  exaggeration  or  fabrication  of  cognitive  dysfunction  for  the 
purpose  of  obtaining  substantial  material  gain”  (e.g.,  financial  compensation  for  injury), 
“or  avoiding  or  escaping  legally  obligated  fonnal  duty”  (e.g.,  child  support  payments, 
military  deployments)  “or  responsibility”  (e.g.,  competency  to  stand  trial;  244;  p.  552). 

Borrowing  heavily  from  the  1999  Slick  et  al.  criteria,  Bianchini,  Greve,  and 
Glynn  (28;  pg.  407)  introduced  Malingered  Pain-Related  Disability  (MPRD)  as  the 
“intentional  exaggeration  or  fabrication  of  cognitive,  emotional,  behavioral,  or  physical 
dysfunction  attributed  to  pain  for  the  purposes  of  obtaining  financial  gain,  to  avoid  work, 
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or  to  obtain  drugs  (incentive),”  and  provided  differential  diagnostic  criteria  for  Possible, 
Probable,  and  Definite  MPRD.  The  authors  proposed  that  compelling  inconsistencies 
(i.e.,  unambiguous  discrepancies)  between  examinee  symptom  report,  test  perfonnance, 
and/or  behavior  should  be  considered  pathognomonic  of  malingering  when  external 
incentives  are  present  and  the  behaviors  are  not  better  accounted  for  by  legitimate 
neurological  or  psychiatric  disorders.  Unlike  the  1999  Slick  et  al.  criteria,  MPRD 
incorporates  evidence  from  a  physical  evaluation  into  its  criteria. 

After  more  than  a  decade  since  first  operationalizing  malingered  neurocognitive 
dysfunction,  Slick  and  Shennan  (245)  recently  published  a  set  of  revised  diagnostic 
criteria  for  what  they  now  call  Malingered  Neuropsychological  Dysfunction  (same  MND 
acronym;  italics  added  for  emphasis).  Instead  of  Possible,  Probable,  or  Definite  subtypes 
of  one  diagnosis,  MND,  Slick  and  Sherman  (245)  propose  Probable  or  Definite  subtypes 
of  three  separate  diagnoses:  Primary  MND,  Secondary  MDN,  and  MND  by  Proxy. 
Primary  MND  is  diagnosed  when  external  incentive  is  present,  when 
exaggeration/fabrication  of  neuropsychological  problems  or  deficits  is  detected,  and 
when  behaviors  are  not  substantially  accounted  for  by  a  psychiatric,  neurological,  or 
developments  factors  (245;  246).  Secondary  MND  incorporates  the  possibility  of 
diminished  cognitive  capacity  and/or  inability  to  control  one’s  behavior  due  to  legitimate, 
severe  cognitive/psychiatric  dysfunction  (e.g.,  severe  TBI,  schizophrenia,  or  mental 
retardation;  245;  246).  MND  by  Proxy  is  diagnosed  when  minors  meet  criteria  for 
malingering  primarily  as  a  result  of  the  intentional  influence  or  control  of  an  adult  (245; 
246). 
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While  multiple  empirically-supported  diagnostic  criteria  for  malingering  exist, 
researchers  and  clinicians  often  avoid  using  the  term  malingering  due  to  its  pejoratively 
judgmental  connotation  and  potential  legal  ramifications,  preferring  “softer”  terms 
instead  (132;  227).  In  fact,  the  Social  Security  Administration  (SSA)  admonishes  its 
psychologists  not  to  use  the  tenn  malingering  because  it  is  “too  subjective”  (50). 
Malingering  and  its  related  terminology  remain  highly  controversial,  mainly  because 
malingering,  by  definition,  is  an  intentional  process  (246). 

Proponents  of  malingering  diagnoses  believe  this  deceitful  intent  can  be  inferred 
through  rigorous  empiricism,  calling  the  practice  of  substituting  “softer”  terms  to 
describe  malingering  unethical  per  American  Psychological  Association  (APA) 
guidelines  that  require  findings  to  be  presented  as  unambiguously  as  possible  (244). 
Opponents  of  malingering  diagnoses  believe  intent  cannot  be  reliably  inferred  from 
cognitive  tests,  and  malingering  (a  conscious  behavior)  cannot  be  reliably  differentially 
diagnosed  from  conversion,  factitious,  or  major  depressive  disorders  (conscious  and 
unconscious  behaviors;  35;  132).  Boone  (35)  argues  that  the  term  malingering  should 
only  be  used  in  the  rare  circumstance  where  there  is  incontrovertible  evidence  of 
malingering  (e.g.,  patient  admission,  surveillance  footage),  and  recommends  changing  the 
1999  Slick  et  al.  tenninology  from  “diagnosis  of  malingered  neurocognitive  dysfunction” 
to  “determination  of  noncredible  neurocognitive  function.” 

Effort:  An  Insufficient  Description 

Neuropsychological  assessment  depends  on  the  examinee’s  full  effort  and 
accurate  symptom  report  for  the  results  to  be  valid  (154).  The  term  effort  describes  the 
amount  of  mental  and/or  physical  energy  expended  in  performing  a  task  (246),  and  is 
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often  used  to  describe  a  subject’s  investment  in  performing  at  his  or  her  capacity  levels 
(41).  In  the  context  of  neuropsychological  evaluations,  the  term  effort  implies  a 
unidirectional  drive  to  respond  in  a  valid  manner.  While  malingering  is  characterized  by 
the  goal  to  which  effort  is  directed  (i.e.,  secondary  gains),  effort  is  often  characterized  by 
level  of  investment  towards  perfonning  well.  For  example,  it  is  common  to  see  the  tenn 
poor  effort  used  to  describe  persons  who  do  not  attempt  to  do  their  best  (99).  Unlike  the 
tenn  malingering,  effort  does  not  imply  intent  or  motivation,  but  rather  serves  to  describe 
responding  behavior  in  general. 

Unfortunately,  the  term  effort  is  often  loosely  and  inappropriately  applied  in 
clinical  and  research  settings,  which  confounds  its  meaning  within  and  across  disciplines 
(246).  For  example,  the  qualitative  and  quantitative  components  of  effort  have  different 
meanings  in  biology,  cognitive  neuroscience,  and  neuropsychology  (30;  228).  From  a 
neuropsychologist’s  perspective,  the  tenn  effort  does  not  differentiate  between  people 
who  are  not  trying  hard  enough  from  those  who  are  deliberately  attempting  to  deceive. 
While  failure  on  an  effort  test  could  reflect  inadequate  or  poor  effort  to  perform  well,  it 
could  also  signify  considerable  effort  towards  performing  poorly!  As  such,  poor  effort, 
inadequate  effort,  suboptimal  effort,  and  other  aforementioned  effort  descriptors  may 
inappropriately  characterize  invalid  responding  behaviors.  Slick  and  Sherman  (246) 
recommend  clinicians  use  more  specific  terms  to  avoid  misunderstandings,  such  as 
noncompliance  (see  below).  Bigler  (30)  argues  that  the  tenn  effort  should  be  abandoned 
altogether. 
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Negative  Response  Bias  and  Noncompliance:  Newer  Descriptions,  Similar  Concepts 

Other  terms  less  common  than  malingering  or  effort  are  beginning  to  gain  support 

in  the  response  validity  literature.  In  neuropsychology,  negative  response  bias  is 

characterized  by  an  attempt  to  mislead  examiners  through  inaccurate  or  incomplete 

responses  (41)  or  the  misrepresentation  of  abilities  through  performance  or  self-report 

(116).  The  “negative”  component  of  the  tenn  refers  to  the  negative  valence  of  response 

bias  that  would  be  expected  (i.e.,  poorer  perfonnance  or  worse  symptom  presentation). 

Like  the  terms  malingering  and  poor  effort,  negative  response  bias  is  detected  when 

persons  fail  to  surpass  thresholds  of  valid  perfonnance  on  validity  measures  or  validity 

indicators  within  ability  tests  and/or  self-report  measures  (116).  Negative  response  bias 

describes  the  behavior  without  inferring  intent  (35). 

Slick  and  Shennan  (246)  propose  describing  failed  validity  indicators  as  a  form  of 

noncompliance  with  test  instructions,  with  compliance  being  defined  as  the  attempt  to 

complete  tasks  in  accordance  with  the  specific  directions  given  for  each  test.  Put  another 

way,  compliant  patients  attempt  to  respond  correctly  (84).  Since  the  instructions  for  any 

given  neurocognitive  assessment  explicitly  state  that  persons  should  make  every  attempt 

to  perform  their  best,  examinees  who  fail  to  try  their  best  would  exhibit  noncompliance. 

Noncompliance  could  occur  consciously  or  unconsciously,  and  does  not  imply  intent. 

Noncompliance  with  test  instructions  describes  both  poor  effort  to  do  well  and  maximal 

effort  to  perform  poorly  (246). 

Describing  Validity  Testing  and  Assessment 

Like  the  terms  to  describe  the  invalid  responding  behavior,  there  is  no  consensus 

tenninology  used  to  describe  validity  assessment.  Symptom  validity  assessment  (SVA), 
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symptom  validity  tests  (SVTs),  effort  testing,  effort  tests,  and  malingering  tests  are 
common  terms  seen  throughout  the  literature,  and  they  are  often  used  interchangeably. 
There  are  generally  two  types  of  validity  tests  or  indicators:  freestanding  (i.e.,  stand¬ 
alone)  measures  that  are  administered  independently  of  other  measures,  and  empirically 
derived  embedded  indices,  scores,  or  markers  found  within  self-report  or  ability  tests. 

The  origin  and  evolution  of  these  assessment  tenns  deserve  special  consideration  in  order 
to  understand  their  use  in  clinical  and  research  settings. 

Pankratz  and  colleagues  (199;  200)  originally  coined  the  tenn  symptom  validity 
test  (SVT)  to  describe  a  dichotemous,  forced-choice  paradigm  that  used  below  chance 
responding  to  detect  exaggeration  or  faking  of  one’s  symptom  presentation.  Originally, 
the  term  was  used  to  identify  conversion  disorder  and  rule  out  malingering  (199).  Many 
of  the  original  freestanding  validity  assessment  tools  used  the  forced-choice  paradigm, 
and  the  term  SVT  quickly  became  associated  with  forced-choice  responding  on  memory 
tests  of  digit  and  letter/word  recognition  (163).  Over  time,  however,  the  term  SVT  has 
become  a  sort  of  “catch-all”  term  that  can  be  applied  to  any  tool  or  index  designed  or 
derived  to  evaluate  the  validity  of  symptoms  and  test  performance  (106).  The  term  SVT 
can  describe  freestanding  measures  of  all  formats  (not  just  forced-choice  paradigms)  as 
well  as  embedded  validity  indicators  within  tests  of  cognitive  ability  and  self-reported 
symptom  inventories  (51;  105;  178).  Similarly,  the  terms  symptom  validity  assessment 
and  symptom  validity  testing  have  evolved  to  encompass  assessment  of  both  symptom 
exaggeration  as  well  as  invalid  responding  on  ability  tests. 

Effort  testing  and  effort  tests  are  other  common  terms  used  alongside  symptom 
validity  assessment  and  symptom  validity  tests  (99;  151;  175).  Green  (95;  99)  routinely 
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refers  to  his  freestanding  measures  as  effort  tests.  Clinicians  and  researchers  may  prefer 
the  term  effort  tests  to  describe  methods  to  evaluate  whether  cognitive  or  physical  effort 
is  sufficient  to  produce  valid  data  (96).  Additionally,  embedded  validity  indicators  are 
sometimes  called  embedded  effort  measures  (42). 

As  described  earlier,  the  tenn  effort  implies  a  unidirectional  motivation  to  perfonn 
at  one’s  best.  Effort  tests  and  effort  testing,  accordingly,  are  purported  to  detect  the 
incidence  of  poor  or  suboptimal  effort  to  perfonn  at  capacity  levels.  By  this  logic,  effort 
test  failure  should  equate  to  insufficient  effort.  However,  as  Slick  and  Sherman  (246) 
note,  effort  test  failure  could  also  indicate  high  levels  of  effort  towards  perfonning  below 
one’s  actual  capacity.  As  such,  the  term  effort  testing  may  be  an  insufficient  description 
of  what  is  being  measured. 

Proposed  Terminology  Changes 

Clearly,  the  plethora  of  words  and  tenns  used  to  describe  invalid  responding 
behavior  and  the  tools  used  to  detect  it  presents  communication  challenges  between 
clinicians,  researchers,  and  the  general  public.  Some  terms,  like  malingering,  are 
pejorative.  The  tenn  malingering  carries  legal  ramifications,  and  many  clinicians  are 
hesitant  to  label  someone  as  a  “malingerer.”  Other  terms,  like  effort,  are  controversial. 

As  it  is  currently  used  in  the  neuropsychology  literature,  the  term  effort  insufficiently 
describes  its  own  construct.  Tenns  like  symptom  validity  assessment  and  symptom 
validity  tests  are  modern  misnomers  that  have  evolved  beyond  their  original  meanings 
(106). 

Several  proposals  have  recently  been  made  to  clarify  the  tenninology.  Slick  and 
Sherman  (246)  suggest  using  noncompliance  detection  measures  in  lieu  of  effort  tests 
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when  describing  response  validity  tools'.  Bianchini,  Curtis,  and  Greve  (27)  use  the  tenn 
cognitive  performance  validity  tests  (CPVTs)  to  describe  freestanding  measures  or  indices 
used  to  determine  if  an  individual  has  underperformed  on  tests  of  perceptual  and/or 
cognitive  ability.  Recently,  Larrabee  (156)  and  Bigler  (30)  each  addressed  invalid 
responding  terminology  issues  and  by  consensus  advocated  abandoning  the  use  of  the 
term  effort  in  favor  of  two  distinct  tenns:  symptom  validity  and  performance  validity. 
Under  this  framework,  symptom  validity  would  solely  describe  the  accuracy  of  one’s 
symptom  presentation  on  self-report  measures  such  as  the  Personality  Assessment 
Inventory  (PAI;  186),  and  performance  validity  would  describe  the  veracity  of  ability 
task  performance  on  freestanding  measures  or  embedded  validity  indices  derived  from 
existing  neurocognitive  tests  (156).  Accordingly,  symptom  validity  tests  (SVTs)  would  be 
used  to  detect  symptom  exaggeration,  and  performance  validity  tests  (PVTs)  would  detect 
underrepresentation  of  one’s  abilities. 

The  litany  of  terms  used  to  describe  the  measures  designed  to  detect  invalid 
responding,  the  assessment  of  invalid  responding,  and  invalid  responding  behavior 
presents  a  tautological  problem  for  clinicians,  researchers,  and  lay  individuals  (30). 
Several  scholars  have  proposed  interesting  and  conceptually  convincing  arguments 
towards  a  new  vernacular,  arguing  for  more  specific  terms  like  performance  validity  to 
describe  ability  representation  and  symptom  validity  to  describe  symptom  presentation 
(30;  156).  This  author  believes  that  the  more  specific  these  terms  become,  the  less 
generalizable  they  are  to  settings  outside  of  a  given  field.  Instead  of  carving  out  niche 
words  for  specific  uses  (e.g.,  cognitive  performance  validity  tests),  this  author  argues  that 

2  Presumably,  “noncompliance  detection  assessment”  would  describe  the  assessment  of 
noncompliance. 
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researchers  and  clinicians  should  opt  for  simpler,  more  global  terms  that  accurately 
describe  constructs.  Despite  their  previously  mentioned  limitations,  symptom  validity  test 
and  symptom  validity  assessment  have  been  adopted  as  “catch-all”  tenns  largely  due  to 
their  ease  of  use  and  universal  application  (106).  In  the  field  of  neuropsychology,  an  SVT 
generally  describes  anything  that  is  designed  to  detect  invalid  responding,  regardless  of 
whether  symptom  presentation  is  considered. 

If  the  ultimate  goal  of  modern  symptom  validity  tests  and  symptom  validity 
assessment  is  to  detennine  response  validity,  then  the  terms  response  validity  tests 
(RVTs)  and  response  validity  assessment  (RVA)  appear  to  be  more  appropriate.  Response 
validity  tests  can  describe  self-report  measures,  perfonnance  measures,  freestanding 
measures,  or  embedded  indices  without  implying  symptom  involvement.  Response 
validity  assessment  can  describe  the  assessment  of  all  types  of  invalid  responding.  The 
terms  response  validity  tests  and  response  validity  assessment  are  simple,  global,  and, 
most  importantly,  accurate.  These  terms  also  avoid  the  eventual  use  of  more 
cumbersome  conceptual  descriptions  such  as  symptom  and  performance  validity  (SPV), 
symptom  and  performance  validity  assessment  ( SPVA ),  and  symptom  and  performance 
validity  tests  (SPVTs). 

The  tenn  response  validity  assessmen  t  also  lends  itself  to  impartial  and 
nonjudgmental  descriptions  of  invalid  responding  behavior.  Malingering,  poor  effort, 
noncredible  responding,  negative  response  bias,  and  noncompliance  all  impart  some 
sense  of  blame  onto  the  examinee,  deserved  or  not.  Invalid  responding,  used  frequently 
thus  far  in  this  literature  review,  can  easily  fit  into  the  response  validity  test/assessment 
framework.  For  example,  one  or  more  failed  response  validity  tests  during  a 
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neurocognitive  assessment  could  raise  suspicion  of  invalid  responding,  and  the  overall 
assessment’s  validity  could  be  questioned.  Consistent  with  Boone’s  (34) 
recommendations,  invalid  responding  describes  a  behavior  without  implying  intent. 

While  this  author  supports  the  use  of  the  terms  response  validity  tests,  response 
validity  assessment,  and  invalid  responding,  it  would  be  imprudent  in  a  literature  review 
to  retroactively  apply  these  terms  to  describe  previous  research.  Accordingly,  the  terms 
used  throughout  this  dissertation  when  describing  prior  research  will  reflect  the  terms 
used  in  their  respective  sources,  and  care  will  be  made  to  accurately  convey  the  authors’ 
intended  meaning.  For  sections  written  specifically  about  this  dissertation  project, 
however,  the  tenns  response  validity  tests,  response  validity  assessment,  and  invalid 
responding  will  be  used. 

Research  Designs 

Given  the  drastic  economic  cost  of  undetected  invalid  responding  in  disability  and 
legal  compensation  claims,  researchers  have  sought  to  identify  the  prevalence  and 
presence  of  invalid  responding.  In  order  to  determine  base  rates  of  invalid  responding 
and  understand  how  invalid  responding  behavior  manifests  in  various  settings  and 
clinical  contexts,  researchers  have  commonly  used  the  following  research  designs:  case 
studies,  differential  prevalence  designs,  simulation  studies  (i.e.,  analogue  research  on 
dissimulation),  and  known-group  designs  (155;  192;  217;  218).  Each  design  carries 
unique  methodological  strengths  and  weaknesses.  Case  studies,  for  example,  are  useful 
for  generating  qualitative  information  and  hypotheses  for  future  research.  Early 
malingering  research  using  case  studies  (121;  200)  eventually  led  to  the  development  of 
the  Portland  Digit  Recognition  Test  (PDRT;  31;  155).  While  the  generalizability  of  case 
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study  results  is  clearly  limited  by  a  small  sample  size  and  other  methodological 
weaknesses,  a  case  study  design  is  often  the  only  practical  design  for  research  of  rare 
syndromes,  diseases,  or  conditions  (155). 

Differential  prevalence  designs  allow  researchers  to  describe  outcome  frequencies 
in  a  given  population  without  systematically  investigating  the  cause  of  the  differences 
(192).  Researchers  using  a  differential  prevalence  design  infer  a  priori  that  two  or  more 
samples  will  have  a  different  prevalence  of  a  condition  (e.g.,  clinical  referrals  for  lung 
problems  compared  to  injury  litigants).  Differential  prevalence  designs  have  been  used 
to  describe  the  effect  sizes  of  varying  levels  of  financial  compensation  and 
neuropsychological  assessment  perfonnance  (32). 

Differential  prevalence  designs,  like  case  studies,  are  significantly  limited  in  their 
internal  and  external  validity.  The  designs  by  their  nature  do  not  incorporate  independent 
criteria  for  the  topic  of  interest  (i.e.,  feigned  performance);  as  a  result,  researchers  cannot 
detennine  who  or  how  many  in  each  group  are  dissimulating.  The  design  allows  analysis 
for  only  overall  differences  based  on  assumptions  that  groups  will  have  different  rates  of 
malingering  (155;  192;  218).  While  Rogers  (218)  argues  that  the  differential  prevalence 
design  is  the  weakest  methodology  for  symptom  validity  assessment  research,  others 
contend  the  design  can  be  useful  for  grasping  a  preliminary  understanding  of  how 
symptoms  present  in  understudied  clinical  populations  (192).  The  differential  prevalence 
design,  like  correlational  studies,  are  useful  for  identifying  group  phenomena  that  merit 
further  exploration  using  more  rigorous  scientific  methods. 

Simulation  or  “analogue  malingerer”  research  employs  a  quasi-experimental 
design  where  subjects  can  be  assigned  randomly  to  different  scenario  groups  (218). 
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Typically,  one  group  is  instructed  to  exaggerate  symptoms  or  feign  impairment  related  to 
a  condition  of  interest.  This  “biased”  or  “dissimulator”  group  is  compared  to  another 
group  of  randomly-assigned  subjects  who  are  instructed  to  perform  normally  or 
“honestly”  (192).  The  biased  groups  are  often  given  a  real  or  pretend  incentive  (e.g., 
college  credit,  monetary  compensation,  etc.)  to  enhance  motivation  to  perfonn  like  a 
“real-world  malingerer.”  Researchers  can  use  this  design  with  clinical  groups  to 
determine  effects  of  invalid  responding  above  and  beyond  the  effects  of  the  clinical 
condition  (e.g.,  ADHD,  mild  TBI,  etc.). 

Simulation  study  designs  allow  researchers  to  better  understand  how  invalid 
symptom  expression  and  perfonnance  manifests  on  a  given  assessment  tool.  The  design 
lends  itself  to  statistical  analyses  that  can  identify  optimal  levels  on  an  assessment  that 
best  differentiate  the  simulation  groups  from  the  comparison  groups  (192).  In  simulation 
studies,  invalid  responding  base  rates  are  known  (e.g.,  33%  or  50%),  and  known  base 
rates  can  be  used  to  generate  predictive  value  statistics  tables  (see  below). 

Simulation  studies  enable  researchers  to  explore  novel  questions  under  well- 
controlled  experimental  conditions.  These  studies  benefit  from  group  randomization, 
matching,  and  knowing  which  subjects  are  simulating  deficits,  ultimately  leading  to  high 
levels  of  internal  validity.  However,  the  external  validity  of  simulation  studies  and  their 
generalizability  to  real-world  settings  is  significantly  limited  (155;  192;  218).  Simulation 
studies  often  utilize  convenience  samples  of  college  undergraduates  for  their  simulation 
study  groups,  which  may  not  translate  to  the  perfonnance  of  actual  malingerers  in 
medico-legal  settings.  The  monetary  reward  in  a  research  study  (typically  less  than  $100) 
pales  in  comparison  to  potential  gains  in  a  forensic  context,  as  do  the  consequences  of 
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“being  caught”  as  an  exaggerator  (192).  Researchers  can  improve  the  generalizability  of 
simulation  study  results  by  recruiting  participants  from  diverse,  community-based 
populations  and  adding  clinical  comparison  groups  (see  below)  to  research  designs. 

Studies  using  a  known-groups  design  classify  feigning  and  non-feigning  groups  a 
priori  using  an  independent  standard  or  criterion  (i.e.,  failure  of  an  RVT),  and  then 
systematically  analyze  the  similarities  and  differences  between  the  “known”  groups  (192; 
217;  218).  Known-groups  research  is  thought  to  have  high  generalizability,  as  data  are 
usually  collected  in  “real-world”  contexts,  such  as  litigants  or  disability  claimants  who 
are  undergoing  a  neuropsychological  evaluation  (192).  Known-groups  designs  address 
major  limitations  in  simulation  studies  by  utilizing  data  collected  in  settings  where 
invalid  responding  is  likely  to  occur  and  where  examinees  are  likely  to  be  motivated  by 
real-world  incentives  (218). 

The  main  problem  of  known-groups  designs  stems  from  the  reliable  and  accurate 
classification  of  the  criterion  groups.  It  is  highly  unlikely  that  people  in  real-world 
settings  with  real-world  incentives  will  openly  admit  after  testing  that  they  did,  in  fact, 
perform  worse  than  their  actual  abilities.  Researchers  must  therefore  rely  on 
operationally  defined  “gold  standards”  such  as  the  1999  Slick  et  al.  criteria  for 
malingered  neurocognitive  dysfunction  (MND)  or  the  2005  Bianchini  et  al.  criteria  for 
malingered  pain-related  disability  (MPRD).  As  with  any  “gold  standard”  in  a  burgeoning 
scientific  field,  these  proposed  criteria  have  critics  (35)  and  are  subject  to  change  based 
on  scientific  developments  (246).  As  a  result,  known-groups  designs  utilize  a  “best- 
available  classification”  system  that  may  ultimately  require  retroactive  group 
reclassification  and  data  reanalysis.  Known-groups  designs  are  also  limited  by 
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researchers  losing  the  ability  to  randomly  assign  participants  to  groups  of  malingers  and 
non-malingers  (155).  Additionally,  researchers  cannot  be  totally  sure  of  the  base  rates  of 
invalid  performance  in  their  samples,  limiting  the  accuracy  of  predictive  value  statistics 
that  can  be  generated  from  the  data.  Consequently,  researchers  using  a  known-groups 
design  must  rely  on  base  rate  estimates  to  generate  their  predictive  value  statistics  (183). 

Recently,  Rogers  (2 1 8)  recommended  combining  simulation  and  known-groups 
designs  into  a  “combined  groups”  design.  This  approach  would  allow  researchers  to 
benefit  from  the  internal  validity  of  simulation  studies  and  the  generalizability  of  known- 
groups  designs  (192;  218).  The  additional  group  allows  researchers  to  verify  that 
between-group  differences  reflect  invalid  performance  rather  than  facets  of  a  clinical 
condition  (61).  However,  adding  clinical  comparison  groups  to  a  simulation  study  is  not 
without  risk  to  internal  validity.  The  added  clinical  group  would  preclude  full-study 
randomization  of  group  assignment,  and  there  is  no  assurance  that  the  clinical  group 
would  be  classified  correctly  (155).  Researchers  can  implement  manipulation  checks  to 
their  simulation  study  protocol  to  increase  confidence  that  the  groups  are  responding  in 
the  desired  manner. 

Stevens  and  Merten  (249)  recently  used  a  combined  groups  design  to  compare 
reaction  time  latency  and  variability  between  three  groups  of  forensic  subjects  with  and 
without  brain  injury  and  a  fourth  group  of  experimental  simulators.  They  found  that 
subjects  who  failed  a  freestanding  RVT  perfonned  significantly  worse  on  cognitive 
testing,  but  that  healthy  simulator  perfonnance  overlapped  considerably  with  real-world 
clinical  group  performance  (249).  Taken  independently,  a  simulator  study  would  have 
provided  insufficient  information  to  use  in  a  clinical  setting,  and  a  known-groups  study 
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using  only  a  pass/fail  criterion  would  not  have  a  “known  simulator”  comparison  group. 

By  combining  these  research  design  elements,  the  authors  were  able  to  enhance  the 
internal  and  external  validity  of  their  results.  Unfortunately,  relatively  few  researchers 
have  adopted  this  approach  to  date,  highlighting  the  need  for  additional  research  using  the 
combined  groups  design  (192). 

Base  Rates  of  Invalid  Responding 

Base  Rates  of  Invalid  Responding  in  Civilian  Populations 

Using  the  aforementioned  research  designs,  researchers  have  been  able  to 

estimate  base  rates  of  invalid  responding  in  a  variety  of  settings  and  contexts.  In  general, 
the  prevalence  of  invalid  responding  varies  depending  on  clinical  conditions  being 
evaluated  and  context  of  the  evaluation  (220).  Largely  in  part  to  a  surge  of  interest  and 
research  towards  the  construct  of  malingering  in  the  1990s  (254),  researchers  were  able  to 
estimate  base  rates  of  malingering  or  invalid  responding  in  a  host  of  civil  and  criminal 
settings  (154). 

Generally,  noncredible  performance  in  clinical  assessment  becomes  more  likely  in 
forensic  settings  and  other  contexts  where  outcomes  involve  the  possibility  of  secondary 
gain  (163;  224).  Based  on  a  survey  completed  by  131  board-certified  clinical 
neuropsychologists  experienced  in  forensic  work  that  encompassed  33,531 
neuropsychological  evaluations,  Mittenberg,  Patton,  Canyock,  and  Condit  (183)  found 
that  32.7%  (95%  CL  ±4.10)  of  disability  cases,  30.4%  (95%  CL  ±3.64)  of  personal  injury 
cases,  22.8%  (95%  CI:  ±5.83)  of  criminal  cases,  and  8%  (95%  CI:  ±1.56)  of  general 
medical  cases  involved  probable  malingering  and  symptom  exaggeration  after  adjusting 
for  referral  source.  Mittenberg  and  colleagues  (183)  also  reported  that  base  rates  of 
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probable  malingering  (also  adjusted  for  referral  source)  differed  considerably  by 
diagnosis,  with  the  top  three  being  41.2%  (95%  Cl:  ±4.51)  of  mild  head  injury  cases, 
38.6%  (95%  Cl:  ±5.54)  of  fibromyalgia  or  chronic  fatigue  cases,  33.5%  (95%  Cl:  ±5.50) 
of  pain  or  somatoform  disorder  cases 

Larrabee  (153)  combined  results  from  1 1  studies  published  between  1978-2002 
with  infonnation  relevant  to  malingering  base  rates  in  mild  traumatic  brain  injury  (mild 
TBI)  litigants.  Of  the  1,363  subjects,  40%  were  identified  as  having  performance  deficits 
associated  with  malingering,  ranging  from  15%  (265)  to  64%  (115).  Despite  the  wide 
range  of  research  methodologies  used  in  the  1 1  studies,  Larrabee’s  (153)  results  were 
closely  related  to  the  adjusted  41.2%  malingering  base  rate  of  mild  head  injury  cases 
reported  by  Mittenberg  and  colleagues  (183).  The  similar  base  rates  of  probable 
malingering  in  compensation-seeking  or  litigating  mild  traumatic  brain  injury  cases 
enhances  the  confidence  of  these  collective  findings. 

Two  studies  conducted  by  Rogers,  Sewell,  and  Goldstein  (222)  and  Rogers, 
Salekin,  Sewell,  Goldstein,  and  Leonard  (221)  reported  forensic  psychologists’  estimates 
of  malingering  base  rates  in  forensic  and  nonforensic  settings.  The  1994  study  reported 
malingering  base  rates  of  15.7%  and  7.4%  in  forensic  and  nonforensic  settings, 
respectively,  and  the  1998  study  reported  base  rates  of  17.4%  and  7.2%  in  forensic  and 
nonforensic  settings,  respectively.  However,  Berry  and  Schipper  (25)  contend  these  data 
are  limited  by  a  lack  of  well-validated  objectives  malingering  assessment  techniques 
available  at  the  time  (i.e.,  mid-1990s),  and  propose  these  relatively  lower  rates  likely 
represent  “floor”  rates  of  psychiatric  malingering.  Berry  and  Schipper  (25)  also  argue 
that  successful  malingers  by  definition  escape  detection  and  would  thus  not  be  included 
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in  the  base  rates  obtained  via  surveys  conducted  by  Rogers  et  al.  (221;  222)  and 
Mittenberg  et  al.  (183).  As  such,  base  rate  data  obtained  from  surveys  may  underestimate 
the  actual  base  rates  of  malingering. 

Recent  studies  report  a  sizable  minority  of  malingering  and  symptom 
exaggeration  among  TBI  patients,  even  in  nonforensic  or  non-litigating  settings. 
Kirkwood  and  Kirk  (146)  examined  193  consecutively  referred  mild  TBI  patients  aged  8- 
17  and  found  17%  of  them  failed  the  Medical  Symptom  Validity  Test  (MSVT;  94), 
despite  no  apparent  external  incentive  to  perform  poorly.  Locke,  Smigielski,  Powell,  and 
Stevens  (165)  reported  a  21.8%  failure  rate  on  the  Test  of  Memory  Malingering  (TOMM; 
263)  in  a  sample  of  87  consecutively  referred,  treatment-seeking  adult  patients  with 
acquired  brain  injury  of  all  severities.  Contrary  to  Mittenberg  et  al.’s  (183)  survey 
findings,  where  the  observed  malingering  base  rate  in  mild  head  injury  cases  was  vastly 
greater  than  moderate-to-severe  head  injury  cases,  Locke  and  colleagues  (165)  found  no 
statistical  difference  in  TOMM  failure  rates  between  mild  and  moderate-to-severe  TBI 
patients. 

In  Social  Security  disability  evaluations,  Chafetz  (49)  reported  adults  performed 
at-or-  below-chance  levels  (i.e.,  definite  malingering  per  Slick  et  al.  1999  criteria)  36.5- 
47.4%  of  the  time  on  either  the  TOMM  or  MSVT,  and  45.8-59.7%  of  claimants  failed 
one  or  both  of  the  SVTs  using  their  cut  score  criteria  (i.e.,  probable  malingering).  As 
cited  by  Chafetz  (48),  Miller  and  colleagues  reported  that  more  than  half  of  their  Social 
Security  disability  claimant  group  failed  at  least  one  SVT.  Other  studies  of  Social 
Security  disability  evaluations  identified  probable  or  definite  malingering  between  42- 
45%  (48;  52). 
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Base  Rates  of  Invalid  Responding  in  Active  Duty  U.S.  Service  Member  and  Veteran 
Populations 

Base  rates  of  invalid  neuropsychological  assessment  perfonnance  among  active 
duty  U.S.  military  Service  Members  and  Veterans  deserve  special  consideration,  since  it 
is  well-established  that  military  service  involves  health  risks  that  can  lead  to  physical  and 
psychological  disability  (122;  123).  Two  military-specific  evaluations — medical 
evaluation  boards  (MEBs)  and  compensation  and  pension  (C&P)  examinations — are 
employed  to  detennine  the  extent  of  such  disability  (65;  209).  MEBs  are  comprehensive 
evaluations  that  determine  whether  active  duty  Service  Members  are  medically  fit  for 
duty.  If  MEBs  determine  that  a  service  member  must  be  medically  separated  from  duty, 
the  service  member  is  typically  given  benefits  commensurate  with  the  level  of  disability 
(209).  C&P  evaluations  are  a  separate  disability  evaluation  within  the  Veterans 
Administration  (VA)  healthcare  system,  and  they  also  detennine  the  extent  (i.e. 
percentage)  of  a  Veteran’s  disability  that  is  related  to  his  or  her  military  service  (i.e. 
service-connected  disability  percentage  or  “service  connection”).  Unlike  a  MEB,  a 
Veteran  can  initiate  a  C&P  evaluation  at  any  time  beyond  the  end  of  his  or  her  military 
service  and  potentially  receive  a  service  connection.  Service  connection  provides 
monthly  monetary  compensation  and  other  ancillary  benefits  such  as  tuition  assistance 
for  dependents,  access  to  services,  and  assistance  in  the  home  (284). 

After  more  than  ten  years  of  warfighting  in  Iraq  and  Afghanistan,  several  studies 
have  explored  the  prevalence  of  invalid  responding  in  military-related 
neuropsychological  assessment.  A  recent  survey  of  168  psychologists  performing 

3  “Veterans”  in  this  manuscript  refers  to  any  person  who  has  served  in  the  military  at  any  point  and 
who  is  eligible  for  VA  benefits.  The  term  “combat  Veteran”  will  be  used  to  describe  the  smaller  subset  of 
Veterans  with  combat  deployment  experience. 
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neuropsychological  assessments  in  VA  healthcare  system  estimated  that  42%  of  Veterans 
fail  RVTs  during  C&P  evaluations  and  25%  fail  RVTs  during  routine  clinical  referrals 
(283).  Given  that  survey  data  tend  to  underestimate  base  rates  of  a  given  condition  (25), 
then  one  might  expect  the  true  rates  to  be  higher.  A  range  of  studies  using  various 
research  methodologies  to  examine  invalid  responding  behavior  among  active  duty 
Service  Members  and  Veterans  have  reported  RVT  failure  rates  ranging  from  17%  to 
68%  (170).  Taken  at  face  value,  these  numbers  can  be  quite  alarming.  However, 
reviewing  these  studies  with  a  critical  lens  allows  a  better  understanding  of  how 
evaluation  context  and  other  factors  may  influence  invalid  responding  base  rates  in 
military  populations  (193). 

Armistead-Jehle  (8)  reported  a  58%  failure  rate  on  the  MSVT  (94)  in  45  U.S. 
Veterans  referred  for  clinical  evaluation  of  possible  postconcussive  symptoms  at  a 
Veterans  Affairs  Medical  Center  (VAMC).  There  were  no  differences  in  gender,  age, 
education,  ethnicity,  previous  posttraumatic  stress  disorder  (PTSD)  or  substance  use 
disorder  diagnoses  between  groups  of  people  who  passed  the  MSVT  and  those  who  failed 
it.  Additionally,  symptom  validity  scales  from  the  Personality  Assessment  Inventory 
(PAI;  186)  designed  to  measure  self-reported  exaggeration  of  negative  symptoms  were 
not  significantly  different  between  groups,  suggesting  MSVT  failure  was  the  result  of 
underperformance  in  a  context  heavily  associated  with  secondary  financial  gain. 

Nelson  and  colleagues  (193)  examined  effort  test  perfonnance  among  1 19  U.S. 
Veterans  at  a  Midwestern  VAMC.  Their  sample  varied  in  terms  of  self-reported 
concussion  history  (yes  or  no),  evaluation  context  (C&P  exam  or  research  study),  and 
deployment  history  (OIF/OEF  or  non-OIF/OEF).  Similar  to  Annistead-Jehle’s  (8) 
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reported  58%  MSVT  failure  rate,  Nelson  and  colleagues  (193)  reported  a  59%  (26/44) 
failure  rate  on  the  Victoria  Symptom  Validity  Test  (VSVT;  243)  among  the  C&P  sample. 
However,  the  research  sample  only  had  a  10.7%  RVT  failure  rate  (8/75).  When 
controlling  for  effort,  the  researchers  found  similar  neuropsychological  profiles  among 
the  Veterans  with  a  history  of  concussion,  supporting  previous  findings  that  effort  plays  a 
significant  role  in  neuropsychological  test  outcomes  (82;  99). 

Drawing  from  a  mixed  clinical  sample  of  286  U.S.  Veterans  from  a  VAMC, 
Axelrod  and  Schutte  (13)  reported  non-dementia  profile  patients  had  a  31.5%  (70/222) 
failure  rate  on  at  least  one  of  the  MSVT’s  “easy”  subtests  (i.e..  Immediate  Recall  [IR], 
Delayed  Recall  [DR],  and  Consistency  [CNS]).  Of  note,  only  1%  of  the  overall  sample 
was  C&P  referrals,  while  32%  of  the  sample  was  referred  by  mental  health  providers. 
This  disparity  highlights  that  invalid  responding  may  occur  in  routine  clinical  evaluations 
without  direct  potential  for  secondary  gain. 

Young,  Sawyer,  Roper,  and  Baughman  (284)  examined  a  sample  of  259  Veterans 
who  were  referred  for  neuropsychological  assessment  at  a  VA  hospital.  A  total  of  74% 
of  the  sample  were  outpatient  referrals,  22%  were  seen  for  C&P  evaluations,  and  4%  for 
inpatient  hospitalizations.  While  Veterans  with  dementia  or  psychotic  disorders  were 
excluded  from  the  sample,  89.6%  of  the  sample  was  detennined  to  meet  criteria  for  at 
least  one  psychiatric  diagnosis.  The  authors  reported  that  44%  of  their  overall  sample  (n 
=  115)  failed  the  WMT  (93),  with  C&P  claimants  failing  the  WMT  at  a  much  higher  rate 
(71%;  n  =  41)  than  clinical  outpatient  referrals  (37%;  n  =  71).  Additionally,  the  average 
service  connection  percentage  was  significantly  higher  for  those  who  failed  the  WMT  (M 
=  39.2%,  SD  =  33.8%)  than  those  who  passed  it  (M=  27.2%,  SD  =  33.8%). 
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Russo  (226)  reported  a  68%  WMT  failure  rate  among  38  consecutively  referred 
OIF/OEF  combat  Veterans  diagnosed  with  TBI  who  presented  for  follow-up 
neuropsychological  testing  in  a  VAMC  setting.  70%  of  the  sample  was  service 
connected  for  disability  at  the  time  of  the  evaluation.  Of  the  26  combat  Veterans  who 
failed  the  WMT,  44.8%  failed  all  three  of  the  “easy”  subtests  (IR,  DR,  and  CNS). 

In  the  first  study  exploring  invalid  responding  among  active  duty  Service 
Members,  Whitney  and  colleagues  (280)  reported  a  17%  MSVT  failure  rate  in  a  sample 
of  23  combat  Veterans  (both  active  duty  and  separated  from  service)  from  OIF/OEF 
reporting  mild  TBI  in  a  VAMC.  While  their  sample  consisted  of  both  individuals  on 
active  duty  at  time  of  testing  (n=9)  and  Veterans  no  longer  on  active  duty  (n=14),  all  four 
of  the  MSVT  failures  were  from  the  active  duty  subset.  The  same  four  Service  Members 
reported  sustaining  the  mild  TBI  over  five  months  prior  to  evaluation  with  a  concussion- 
related  loss  of  consciousness  lasting  10  minutes  or  less.  Three  of  the  four  Service 
Members  also  failed  the  TOMM  (263),  and  none  met  criteria  for  the  MSVT’s  dementia 
profile. 

Annistead-Jehle  and  Hansen  (10)  administered  three  stand-alone  RVTs  to  a 
sample  of  85  active  duty  military  Service  Members  that  largely  consisted  of  persons 
reporting  a  history  of  mild  TB I/concussion  (84.7%  of  the  sample)  and/or  mental  health 
conditions  (78.8%  had  a  psychiatric  diagnosis).  Only  seven  (8.2%)  participants  were 
involved  in  a  MEB  process,  and  none  were  involved  in  litigation;  as  such,  the  authors 
concluded  that  the  majority  of  the  participants  lacked  discernible  motivation  for 
secondary  gain.  Even  without  a  known  incentive  to  do  poorly,  the  overall  sample  had 
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failure  rates  of  20%  on  the  MSVT,  15%  on  the  Nonverbal  MSVT  (NV-MSVT;  97),  and 
11%  on  the  TOMM. 

While  these  results  appear  to  be  consistent  with  Whitney  et  al.’s  (280)  findings 
(17%  MSVT  failure  rate),  Annistead-Jehle  and  Hansen  (10),  considered  that  their 
sample’s  overall  results  may  have  underestimated  malingering  prevalence  by  including  a 
disproportionate  percentage  of  officers  in  their  sample  (54.4%),  most  of  whom  were  field 
grade  (i.e.  middle-to-senior  level)  officers  attending  a  rigorous,  year-long  military 
professional  development  course  (Intermediate  Level  Education;  ILE).  Among  the  non- 
ILE  sample  (n  =  47),  which  the  authors  contended  may  better  represent  an  active  duty 
military  population  with  respect  to  age,  rank,  education,  and  ethnicity,  the  authors 
reported  a  30%  failure  rate  on  the  MSVT  (n  =  14),  a  21%  failure  rate  on  the  NV-MSVT 
(n  =  10),  and  a  15%  failure  rate  on  the  TOMM  (n  =  7).  The  non-ILE  group’s  failure  rates 
were  higher  than  the  ILE  group’s  8%  failure  rates  on  the  MSVT  (n  =  3),  NV-MSVT  (n  = 
3),  and  the  TOMM  (n  =  3)4.  The  authors  argued  that  underlying  variables  relating  to 
subgroup  membership  among  active  duty  and  Veteran  samples  should  be  closely 
examined  when  conducting  malingering  research  in  a  military  population. 

Annistead-Jehle  and  Buican  (9)  recently  conducted  the  most  comprehensive  study 
of  performance  validity  among  Service  Members  on  active  duty  to  date.  The  study’s 
sample  consisted  of  335  Service  Members  receiving  neuropsychological  evaluations  at  a 
military  TBI  clinic,  with  117  undergoing  a  MEB  and  218  completing  a 
neuropsychological  evaluation  for  non-MEB/clinical  purposes.  The  authors  reported  an 
overall  WMT  (93)  failure  rate  of  41.8%.  The  authors  also  reported  that  the  failure  rate  of 

4  The  same  three  ILE  students  each  failed  the  MSVT,  NV-MSVT,  and  the  TOMM  (P.  Armistead- 
Jehle,  personal  communication,  January  1 1,  2013). 
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those  undergoing  a  MEB  (63/117;  53.8%)  was  significantly  higher  than  those  undergoing 
a  non-MEB/clinical  evaluation  (77/218;  35.3%). 

In  light  of  the  wide  range  of  invalid  responding  rates  in  military  samples, 
McConnick  and  colleagues  (170)  recently  conducted  a  well-controlled,  prospective, 
multisite  study  of  214  OIF/OEF  combat  Veterans  (both  active  duty  and  separated  from 
service)  in  “research-only”  and  “dual5”  conditions.  None  of  the  evaluations  were 
conducted  in  the  context  of  a  C&P  evaluation.  The  authors  reported  an  overall  WMT 
failure  rate  of  25%,  with  a  42%  (33/78)  dual  group  failure  rate  and  a  15%  (21/136) 
research  group  failure  rate.  Failure  rates  did  not  differ  among  those  with  and  without 
service  connected-disability.  These  results  reiterate  Nelson  and  colleagues’  (193) 
findings  that  invalid  responding  rates  vary  in  different  evaluation  contexts. 

By  combining  overall  RVT  failure  rates  reported  among  non-demented  and  non- 
psychotic  participants  in  the  nine  studies  from  this  literature  review  section  (8-10;  13; 

170;  193;  226;  280;  284),  one  would  identify  486  RVT  failures  out  of  1340 
neuropsychological  evaluations,  a  36.2%  failure  rate6.  However,  this  figure  combines 
failure  rates  for  disability  evaluations,  clinical  referrals,  and  research  study  participants. 
Among  studies  that  directly  reported  the  data,  active  duty  Service  Members  and  Veterans 
had  a  13. 7%7  RVT  failure  rate  in  research-only  evaluation  contexts  (170;  193),  but  had  a 


5  Primarily  a  clinical  evaluation  to  inform  patient  care  with  patient  consent  for  information  to  be 
included  in  research  study 

6  26  failures/45  evaluations  using  MSVT  (Armistead-Jehle,  2010);  34/119  using  various  RVTs 
(Nelson  et  al,  2010);  70/222  using  MSVT  (Axelrod  &  Schutte,  2010);  115/259  using  WMT  (Young  et  al., 
2012);  4/23  on  MSVT  (Whitney  et  al.,  2009);  26/38  using  WMT  (Russo,  2012);  54/214  using  WMT 
(McCormick  et  al.,  2013);  17/85  using  MSVT  (Armistead-Jehle  &  Hansen,  2011);  and  140/335  using 
WMT  (Armistead-Jehle  &  Buican,  2012). 

7  8  failures/75  evaluations  using  various  RVTs  (Nelson  et  al.,  2010)  and  21/136  using  WMT 
(McCormick  et  al.,  2013). 
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59.3%  RVT  failure  rate  in  military-related  medical  disability  evaluations  (9;  193;  284). 

It  is  clear  from  multiple  studies  with  active  duty  Service  Members  and  Veterans  that 
invalid  responding  rates  vary  depending  on  evaluation  context  (9;  170;  193),  with  the 
highest  base  rates  among  disability  evaluations. 

Base  Rates  Summary 

As  Rogers  and  colleagues  noted  in  1993,  the  prevalence  of  malingering  or  invalid 
responding  appears  to  vary  based  on  context,  diagnosis,  and  population.  Perhaps  most 
surprising  are  the  reported  base  rates  from  samples  who — according  to  the  authors — had 
no  discernible  incentive  to  perform  less  than  their  best.  Based  on  research  involving 
adults  (165;  193;  221;  222)  and  children  (146)  undergoing  neuropsychological  evaluation 
without  a  known  external  incentive,  one  can  conservatively  estimate  a  10%  base  rate  of 
invalid  responding  on  all  neurocognitive  evaluations,  regardless  of  setting  or  assessment 
context.  As  the  potential  for  financial  incentive  or  other  secondary  gain  increases  (e.g., 
disability  evaluations,  legal  claims),  the  base  rate  of  invalid  responding  also  increases. 

Larrabee,  Millis,  and  Meyer  (158)  contend  that,  in  the  presence  of  potential 
secondary  gain,  the  likelihood  of  invalid  responding  increases  to  about  40%,  plus  or 
minus  10%.  Based  on  the  studies  reviewed  in  this  section  of  the  manuscript,  it  is 
reasonable  to  consider  that  range  appropriate  in  civilian  cases  where  secondary  gain  is 
most  likely  and  diagnosis  of  a  condition  is  most  subjective.  Among  U.S.  Service 
Member  and  Veteran  populations,  however,  invalid  responding  may  occur  on  as  much  as 
60%  of  all  disability  (e.g.,  secondary  gain)  evaluations  (9;  193;  284).  Even  in  settings 


8  26/44  using  VSVT  (Nelson  et  al„  2010);  41/58  using  WMT  (Young  et  at,  2012);  and  63/117 
using  WMT  (Armistead-Jehle  &  Buican,  2012). 
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where  secondary  gain  was  not  overtly  obvious,  it  was  common  to  see  invalid  responding 
rates  range  between  8-35%  among  military  populations  (9;  10). 

In  addition  to  secondary  gain,  factors  such  as  fatigue,  stress,  medical  conditions, 
psychiatric  conditions,  medications,  time  since  injury,  initial  injury  severity,  testing 
environment,  examiner  skill,  assessment  instructions,  and  language/cultural 
considerations  contribute  to  invalid  neuropsychological  test  performance  (11;  181). 
Collectively,  the  findings  presented  in  this  section  underscore  the  importance  of  response 
validity  testing  in  all  neuropsychological  evaluations,  even  in  contexts  where  secondary 
gain  may  not  be  overt  (41;  116). 

Diagnostic  Validity  and  Classification  Accuracy  Statistics 

The  diagnostic  validity  of  a  test  refers  to  its  ability  to  differentiate  subjects  with 

and  without  a  given  condition.  Classification  accuracy  statistics,  such  as  sensitivity  (SN), 
specificity  (SP),  positive  predictive  value  (PPV),  negative  predictive  value  (NPV),  and 
likelihood  ratios  (LRs),  describe  diagnostic  validity  (15;  135;  173).  In  contrast  to  group 
statistics  such  as  /-tests  or  ANOVAs,  which  classify  group  differences,  classification 
accuracy  statistics  are  considered  to  be  individual  statistics  that  are  useful  for  determining 
which  subjects  are  contributing  to  group  differences  (157). 

Classification  accuracy  statistics  include  values  that  are  unique  to  an  instrument 
and  values  that  rely  on  the  prevalence  of  a  condition.  Sensitivity,  specificity,  and  hit  rate 
(HR)  indices  can  be  calculated  for  any  instrument  without  incorporating  prevalence  of  a 
given  condition.  Predictive  value  statistics,  on  the  other  hand,  calculate  values  based  on 
base  rates  of  a  condition  (15).  Positive  predictive  value  and  negative  predictive  value, 
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also  known  as  positive  and  negative  predictive  power  (PPP;  NPP),  are  examples  of 
predictive  value  statistics. 

To  better  illustrate  classification  accuracy  statistics,  one  can  refer  to  the 
contingency  table  featured  in  Table  1.  Sensitivity  (SN)  is  defined  as  the  probability  of  a 
positive  test  result  in  persons  who  have  the  condition  or  characteristic  of  interest,  or  the 
true  positive  rate.  It  is  the  ratio  of  the  number  of  true  positives  (TP)  to  the  number  of  true 
positives  plus  false  negatives  (TP+FN).  The  formula  for  sensitivity  is: 

(1)  SN  =  TP/(TP+FN). 

Specificity  (SP),  on  the  other  hand,  is  the  probability  of  a  negative  test  result  in  persons 
who  do  not  have  the  condition  or  characteristic  of  interest,  or  the  true  negative  rate.  It  is 
the  ratio  of  the  number  of  true  negatives  (TN)  to  the  number  of  true  negatives  plus  false 
positives  (TN+FP).  The  formula  for  specificity  is: 

(2)  SP  =  TN/(TN+FP). 

Subsequently,  the  false  positive  rate  can  be  calculated  using  the  following  fonnula: 

(3)  FPrate  =  (l  -  SP). 

The  hit  rate  (HR)  or  overall  diagnostic  power  of  a  test  describes  the  overall  correct 
classification  ability  of  a  measure.  It  is  the  ratio  of  total  correct  classifications  (TP+TN) 
to  total  number  of  subjects  evaluated  (N).  The  hit  rate  is  also  known  as  the  efficiency  of 
the  test,  or  the  probability  that  the  test  outcome  and  actual  diagnostic  condition  agree. 

The  formula  for  hit  rate  index  is: 

(4)  HR  =  (TP+TN)/N. 

The  base  rate  (p)  is  defined  as  the  prevalence  or  frequency  of  a  condition  of  interest  in  a 
given  population.  The  formula  for  base  rate  is: 
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(5)  p  =  (TP+FN)/N. 


As  a  result  of  their  ratio  properties,  the  values  of  sensitivity,  specificity,  hit  rate,  and  base 
rate  range  from  0  to  1.00. 

Sensitivity  and  specificity  values  inversely  vary  at  different  diagnostic  cutoffs.  If 
a  cutoff  score  is  adjusted  to  increase  a  test’s  sensitivity,  the  test’s  specificity  would 
decrease,  and  vice  versa.  For  example,  as  one  changes  a  cutoff  score  to  correctly  identify 
more  people  who  have  a  disease,  one  increases  the  likelihood  of  making  an  false  positive 
error  (i.e.,  Type  I  error). 

Receiver  operating  characteristic  (ROC)  curves  are  used  to  plot  the  relationship 
between  true  positive  rates  and  false  positive  rates,  and  can  be  used  to  determine  the 
overall  accuracy  of  a  test  (110;  127;  172;  256).  ROC  graphs  plot  a  diagnostic  tool’s  true 
positive  rate  (i.e.,  sensitivity)  as  a  function  of  its  false  positive  rate  (i.e.,  [1-SP]),  resulting 
in  a  graphical  snapshot  of  classification  abilities  at  varying  levels  of  test  outcomes  (187). 
As  a  result,  ROC  curves  pennit  researchers  and  clinicians  to  determine  the  optimum 
cutting  score  on  a  psychometric  test  and  display  the  information  on  a  figure. 

The  area  under  the  ROC  curve  (AUC)  describes  the  overall  diagnostic  power  of 
the  test  from  0  to  1 ,  where  0  represents  a  perfectly  inaccurate  test  and  1  indicates  a 
perfectly  accurate  test  (168).  By  definition,  a  combined  sensitivity  and  false  positive  rate 
of  0.50  represents  a  50%  chance  of  making  a  correct  diagnosis.  As  such,  a  test  with  an 
AUC  of  0.5  demonstrates  classification  accuracy  no  better  than  chance.  The  closer  the 
AUC  approaches  1,  the  greater  the  likelihood  that  the  test  will  identify  a  true  positive  and 
not  make  a  false  positive  error.  AUC  values  from  0.7  to  less  than  0.8  are  considered 
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acceptable,  0.8  to  less  than  0.9  are  considered  excellent,  and  values  greater  than  or  equal 
to  0.9  are  considered  outstanding  (125). 

As  described  above,  predictive  value  statistics  incorporate  base  rates  of  a 
condition  to  predict  the  likelihood  of  a  measure  making  a  correct  diagnosis.  Positive 
predictive  value  (PPV)  describes  the  probability  of  a  given  condition  being  present  given 
a  positive  test  finding.  It  is  the  ratio  of  true  positives  (TP)  to  all  positive  scores  (TP+FP), 
and  is  expressed  in  the  following  formula: 

(6)  PPV  =  (TP)/(TP+FP). 

The  previous  formula  assumes  that  one  knows  with  certainty  which  results  were  true 
positive  and  which  results  were  false  positives.  Since  this  knowledge  is  rarely  known, 
one  can  incorporate  base  rates  (p)  into  the  formula: 

(7)  PPV  =  (p  *  SN)/[(p  *  SN)+(  1  -p)(  1  -SN)] . 

Negative  predictive  value  (NPV)  describes  the  probability  of  a  given  condition  being 
absent  given  a  negative  test  finding.  It  is  the  ratio  of  true  negatives  (TN)  to  all  negative 
scores  (TN+FN),  and  is  expressed  in  the  following  formula: 

(8)  NPV  =  (TN)/(TN+FN). 

Like  PPV,  the  basic  NPV  formula  assumes  that  one  already  knows  which  persons  did  and 
did  not  have  a  condition.  When  the  presence  or  absence  of  a  condition  is  unknown,  one 
can  incorporate  base  rates  (p)  into  the  formula: 

(9)  NPV  =  [( 1  -p)SP]/[((  1  -p)SP  )+p(  1  -SN)] . 

It  should  be  noted  that  PPV  and  NPV  vary  as  a  function  of  base  rates;  using  equivalent 
test  outcomes,  PPVs  would  increase  with  higher  base  rates  (NPVs  would  decrease).  As 
base  rates  decrease,  NPVs  would  increase  and  PPVs  would  decrease. 
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Likelihood  ratios  (LRs)  reflect  the  percent  chance  that  a  person  has  a  condition  when 
testing  positive  and  vice  versa  (102).  The  positive  likelihood  ratio  (LR+)  is  the  true 
positive  rate  divided  by  the  false  positive  rate,  and  is  expressed  in  the  following  fonnula: 

(10)  LR+  =  SN/(1-SP). 

Lastly,  the  negative  likelihood  ratio  (LR-)  is  the  false  negative  rate  divided  by  the  true 
positive  rate,  and  is  expressed  in  the  following  formula: 

(11)  LR-  =  (1-SN)/SP. 

Existing  Types  of  Symptom  Validity  Tests 

The  National  Academy  of  Neuropsychology  (NAN)  and  the  American  Academy 

of  Clinical  Neuropsychology  (AACN)  call  response  validity  assessment  an  essential  part 

of  any  neuropsychological  evaluation  (41;  116).  To  accommodate  this  end,  clinicians  can 

utilize  tests  and  indices  specifically  designed  to  detect  invalid  responding.  As  described 

earlier,  these  tests  are  collectively  known  as  symptom  validity  tests  (SVTs),  although  they 

can  measure  validity  in  both  symptom  report  and  perfonnance. 

Symptom  validity  tests  can  describe  independent,  freestanding  tests  (98;  263)  or 

embedded  validity  indices  derived  from  existing  self-report  or  neurocognitive  measures 

of  attention  (198),  memory  (21),  and  psychomotor  speed  (195).  Both  freestanding  and 

embedded  SVTs  have  unique  strengths  and  weaknesses,  and  they  are  generally  seen  as 

complimentary  tools  for  symptom  validity  assessment.  Given  that  an  examinee’s 

cognitive  effort  may  fluctuate  during  the  course  of  long  neuropsychological  test  battery, 

examiners  often  use  multiple  freestanding  and  embedded  SVTs  throughout  a  battery  (36). 

The  AACN  recently  recommended  that  both  freestanding  and  embedded  SVTs  be  used  in 
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neuropsychology  evaluations  involving  the  potential  for  secondary  gain,  with  embedded 
measures  used  at  a  minimum  if  time  is  constrained  (116). 


Freestanding  Symptom  Validity  Tests 

Freestanding  SVTs,  also  known  as  “stand-alone”  SVTs,  are  designed  to  detect 

response  bias  or  exaggeration  of  deficits  while  appearing  to  examinees  as  a  cognitive  test 
(e.g.,  test  of  memory,  attention,  processing  speed).  Freestanding  SVTs  are  considered  to 
be  the  most  accurate  and  well-studied  type  of  SVT  (27).  Though  not  limited  to  the 
format,  the  most  commonly  researched  freestanding  SVTs  utilize  a  forced-choice 
paradigm  (244).  The  essential  characteristic  of  the  forced-choice  test  is  to  identify 
below-chance  responding  on  a  series  of  multiple,  two-alternative  presentations  of  words, 
digits,  or  patterns.  Forced  choice  tests  use  the  z  approximation  of  the  binomial  theorem 
(Equation  12  applies  when  there  is  a  50%  probability  of  responding  correctly)  and 
empirically-derived  cutoff  scores  to  identify  significantly  worse  than  chance  responding 
or  insufficient  effort  (132).  If  a  subject  responds  significantly  worse  than  chance  based 
on  a  certain  cutoff  score,  there  is  strong  evidence  for  intentionally  avoiding  the  correct 
answer  (85). 


(12)  Uncorrected  z  score  = 


(#  of  errors)-  0.5(#  of  items) 


y<f25(#~ofitems) 

Freestanding  SVTs  can  be  categorized  by  the  type  of  stimuli  that  form  the  basis 
for  the  test,  such  as  digit  recognition  tasks,  letter-  and  word-based  tasks,  and  visual  or 
mixed  verbal-visual  tasks.  Many  of  the  earliest  freestanding  SVTs  used  digit  sequences. 
One  of  these  digit  recognition  tasks,  the  Hiscock  and  Hiscock  (121)  Forced-Choice  Test, 
required  subjects  to  choose  between  two  five-digit  numbers  shown  on  a  card,  one  of 
which  was  seen  by  the  subject  prior  to  a  brief  delay.  This  test  was  also  known  as  the 
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Hiscock  Digit  Memory  Test  (HDMT),  and  it  became  one  of  the  first  widely  used  SVTs 
(105).  The  HDMT  presented  a  five-digit  string  of  numbers  at  increasing  (5,  10,  15 
seconds)  lengths  of  time  before  asking  the  participant  to  choose  between  two  numbers, 
one  of  which  the  subject  had  seen  before.  In  1993,  Binder  created  the  Portland  Digit 
Recognition  Task  (PDRT),  a  visual  recognition  task  of  orally-presented,  similar  five-digit 
number  combinations,  classified  as  “Easy”  or  “Hard”  items.  The  PDRT  includes  an 
interference  task  of  counting  backward  aloud  during  intervals,  making  it  more  difficult 
for  well-motivated  subjects  than  similar  tests  without  interference  (132).  The 
Computerized  Assessment  of  Response  Bias  (CARB;  1),  as  its  name  implies,  is  a 
computer-based  forced-choice  task  that  presents  five-digit  number  to  the  examinee  for  a 
few  seconds.  The  Victoria  Symptom  Validity  Test  (VSVT;  243)  is  another  computerized 
forced-choice  digit  recognition  test  that  contains  48  “Easy”  or  “Hard”  items, 
characterized  by  the  number  of  shared  digits  between  the  target  (i.e.,  correct  response) 
and  the  foil  (i.e.,  incorrect  response  similar  to  the  correct  response).  The  VSVT  program 
uses  Bayesian  analyses  and  response  latencies  to  identify  invalid  responding  (132). 

Another  group  of  freestanding  SVTs  uses  letters  or  words  instead  of  digits.  The 
Word  Memory  Test  (WMT;  98)  and  Medical  Symptom  Validity  Test  (MSVT;  94)  are 
well-researched  word-based  tests  of  cognitive  effort.  Both  WMT  and  MSVT  are 
computerized  effort  tests  that  appear  to  be  verbal  memory  tests  of  word  pairs.  The  tests 
assess  immediate  forced-choice  recognition,  delayed  forced-choice  recognition, 
consistency  of  responses,  delayed  cued  recall,  and  delayed  free  recall.  The  MSVT  is 
similar  to  the  WMT,  but  uses  a  smaller  word  list  and  is  faster  to  administer.  Recently, 
Tombaugh  and  colleagues  (113;  264)  developed  the  Computerized  Tests  of  Information 
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Processing  (CTIP),  a  computer-based  reaction  time  and  processing  speed  measure.  The 
CTIP  uses  three  reaction  time  subtests  to  determine  invalid  responding:  1)  simple 
reaction  time  to  a  repeatedly  occurring  letter,  2)  choice  reaction  time  to  a  forced-choice 
word  recognition  task,  and  3)  semantic  search  reaction  time  to  determine  whether  a  word 
belongs  semantically  to  a  given  category  (163).  Other  freestanding  SVTs  using  letters 
and  numbers  include  the  Letter  Memory  Test  (LMT;  130),  the  21-Item  Test  (133),  and 
the  b  Test  (38). 

Some  freestanding  SVTs  use  visual  stimuli  or  a  combined  visual-verbal  format. 
Swiss  psychologist  Andre  Rey  designed  the  original  Dot  Counting  Test  (DCT)  in  1941. 

In  2002,  Boone,  Lu,  and  Herzberg  developed  a  slightly  different  version  of  the  Rey  DCT. 
Both  Rey’s  and  Boone  et  al.’s  Dot  Counting  Tests  examine  whether  total  dot  counting 
time  is  related  to  increasing  task  difficulty.  Rey  (214)  also  developed  the  Fifteen-Item 
Test  (FIT),  a  non-forced-choice  SVT  that  presents  fifteen  designs  to  a  subject  for  a  brief 
period  of  time  and  later  has  the  subject  reproduce  as  many  designs  as  possible.  The  FIT 
is  also  called  the  “Rey  Memory  for  15  Items  Test”  (132),  “Rey’s  Memory  Test”  (24),  and 
the  Rey  15-Item  Memory  Test”  (232).  Although  its  utility  as  a  symptom  validity  test  is 
poor  (216),  the  FIT  remains  one  of  the  most  commonly  used  SVTs  (239;  247).  The  Test 
of  Memory  Malingering  (TOMM;  263)  is  a  visually-based,  forced-choice  SVT  that  asks 
participants  to  recognize  line  drawings  presented  on  either  paper  booklets  or  computer 
monitors.  Green’s  (97)  Nonverbal  Medical  Symptom  Validity  Test  (NV-MSVT)  is  the 
visual  equivalent  of  the  previously-described  MSVT  (94),  using  forced-choice 
recognition  memory  of  10  visually-presented  color  image  pairs.  Some  less  frequently 
used  SVTs  that  employ  visual  and  mixed  visual-verbal  formats  include  the  Amsterdam 
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Short-Term  Memory  Test  (ASMT;  229),  the  Validity  Indicator  Profile  (VIP;  83),  and  the 
Coin-in-the-Hand  Test  (141). 

Other  freestanding  SVTs  include  self-report  measures  designed  specifically  for 
the  identification  of  symptom  exaggeration.  The  Structured  Interview  of  Reported 
Symptoms  (SIRS;  219)  has  been  described  as  “the  gold  standard”  for  examining 
malingered  mental  illness  in  the  field  of  forensic  psychology  and  psychiatry.  The  Miller 
Forensic  Assessment  of  Symptoms  Test  (MFAST;  180)  is  also  popular  among  criminal 
populations  in  determining  competency  to  stand  trial  (136).  The  Structured  Inventory  of 
Malingered  Symptomatology  (SIMS;  281)  contains  75  true-false  items  designed  to 
provide  a  self-administered  malingering  screening  in  about  15  minutes.  Designed 
specifically  to  evaluate  symptom  exaggeration  in  PTSD  claimants,  the  Morel  Emotional 
Numbing  Test  (MENT;  184)  has  extensive  evidence  supporting  its  ability  to  identify 
invalid  responding  (176;  185). 

Embedded  Validity  Indices 

Unlike  freestanding  symptom  validity  tests,  which  are  designed  with  the  specific 
purpose  of  identifying  invalid  responding,  embedded  validity  indices  are  empirically 
derived  from  commonly  administered  neuropsychological  or  psychiatric  tests  (234). 
Embedded  validity  indices  may  consist  of  a  single,  specially  developed  test  score  (e.g., 
Reliable  Digit  Span  [RDS];  101),  combinations  of  scores  (e.g.,  Vocabulary  minus  Digit 
Span  [VDS]  on  the  Wechsler  Test  of  Adult  Intelligence-Ill  [WAIS-III;  Wechsler,  1997]; 
59),  or  standard  clinical  scores  that  have  been  shown  to  discriminate  between  malingerers 
and  valid  responders  (27).  Embedded  measures  provide  examiners  several  advantages 
when  used  in  lieu  of  or  in  conjunction  with  freestanding  SVTs.  First,  embedded  indices 
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allow  for  efficient  assessment  of  performance  validity  without  adding  to  the  time 
constraints  of  a  typically  lengthy  assessment  battery  (27;  36;  234).  Next,  embedded 
validity  indices  appear  to  be  less  vulnerable  to  the  effects  of  coaching  than  forced-choice 
SVTs  (234).  Instead  of  solely  testing  one’s  “memory,”  as  most  freestanding  SVTs 
purport  to  do,  embedded  measures  can  be  derived  from  tests  across  multiple  cognitive 
domains,  such  as  attention,  processing  speed,  executive  functioning,  and  psychomotor 
speed  (36;  234).  Adding  embedded  validity  indices  to  an  assessment  increases  the 
overall  number  of  SVTs  used  in  a  test,  increasing  the  likelihood  that  negative  response 
bias  will  be  detected  (27;  234).  Lastly,  embedded  measures  allow  perfonnance  validity 
assessment  at  multiple  time  points,  which  is  important  since  a  person’s  responding 
behavior  may  vary  within  and  across  tests  throughout  an  assessment  (36;  116). 

Embedded  validity  indices  can  also  be  derived  from  self-report  inventories  and 
questionnaires.  Generally,  the  “hallmark  of  functional  and  simulated  disorders  on  these 
paper-and  pencil  scales  and  inventories  is  abnormally  exaggerated  complaints — whether 
in  their  variety,  severity,  or  both”  (163,  p.  858).  It  is  common  for  neuropsychological 
assessments  to  include  measures  of  emotional  functioning  and  personality,  and  many  of 
these  measures  contain  built-in  validity  scales.  The  Minnesota  Multiphasic  Personality 
Inventory-2  (MMPI-2;  43),  for  example,  includes  three  validity  scales  (i.e.,  L,  F,  and  K) 
along  with  its  ten  clinical  scales.  Several  validity  scales  aimed  at  identifying  varying 
forms  of  symptom  exaggeration  and  noncredible  responding  have  been  derived  for  the 
MMPI-2,  including  the  Symptom  Validity  Scale  (abbreviated  as  “FBS”  since  it  was 
originally  called  the  “Fake  Bad  Scale”;  162),  the  Response  Bias  Scale  (RBS;  88),  the 
Henry-Heilbronner  Index  (HHI;  120),  and  the  Meyers  Index  (177).  The  Personality 
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Assessment  Inventory  (PAI;  186)  contains  four  measures  of  response  bias  and  validity, 
Inconsistency  (ICN),  Infrequency  (INF),  Negative  Impression  (NIM),  and  Positive 
Impression  (PIM).  The  Millon  Clinical  Multiaxial  Inventory- III  (182)  uses  three 
modifier  indices  to  evaluate  symptom  validity:  Disclosure  (i.e.,  how  much  psychological 
information  one  is  revealing),  Desirability  (i.e.,  under-reporting),  and  Debasement  (i.e., 
over-reporting).  It  should  be  noted  that  self-report  inventories  and  questionnaires  are 
generally  less  sensitive  to  detecting  bona  fide  response  bias  than  performance-based 
validity  indices,  but  nonetheless  prove  useful  in  describing  both  the  internal  consistency 
of  a  patient’s  complaints  and  how  those  complaints  align  with  a  patient’s  cognitive  test 
performance  and  medical  status  (163). 

Schutte  and  Axelrod  (234)  describe  several  methods  used  to  empirically  derive 
embedded  validity  indices:  considering  floor  effects  in  determining  variables  of  interest, 
incorporating  forced  choice  components  into  existing  tests,  and  conducting  studies  using 
simulation  and  known-groups  (i.e.,  criterion  variable)  research  designs.  Regarding  floor 
effects,  one  would  expect  performance  by  less-injured  (i.e.,  mild  TBI)  persons  on  a 
cognitive  test  to  be  generally  better  than  persons  with  more  severe  injuries  (i.e., 
moderate-to-severe  TBI).  Differential  prevalence  designs  can  identify  indices  where 
lesser-injured  persons  perform  worse  than  their  more  severely-injured  peers,  something 
that  may  later  be  used  to  identify  invalid  responding.  Tests  of  learning  and  memory  that 
include  a  forced-choice  component  can  be  used  to  identify  response  bias,  such  as  the 
California  Verbal  Learning  Test-II  (CVLT-II;  63),  Rey  Auditory  Verbal  Learning  Test 
(RAVLT;  231),  Warrington’s  Recognition  Memory  Test  (RMT;  274),  and  Seashore 
Rhythm  Test  (236).  Cut  scores  on  these  forced-choice  measures  are  determined  by 
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statistically  comparing  performance  between  known  groups  (e.g.,  honest  responders, 
malingerers,  clinical  patients).  Additionally,  prospective  and  retrospective  simulation 
and  criterion  variable  study  designs  can  identify  indices  with  between  group  differences 
which  can  later  be  run  through  classification  accuracy  statistics. 

Limitations  of  Freestanding  SVTs  and  Embedded  Validity  Indices 

Freestanding  symptom  validity  tests  and  embedded  validity  indices  each  have 

unique  limitations  that  merit  specific  discussion.  A  plethora  of  infonnation  about  SVTs 
and  how  to  “beat”  them  are  readily  available  on  the  internet,  threatening  test  security 
(20).  Widespread  coaching  and  access  to  SVT  infonnation  has  led  to  several  well- 
validated  SVTs  losing  their  sensitivity  to  identify  invalid  responding  in  the  past  few 
decades  (223).  Forced-choice  paradigms,  the  most  common  type  of  freestanding  SVT, 
are  particularly  easy  to  identify  and  are  vulnerable  to  coaching  effects  (234). 
Unscrupulous  attorneys  or  individuals  pending  medico-legal  evaluations  can  easily 
describe  to  others  which  tests  appear  to  “test  one’s  memory”  using  digits,  words,  letters, 
or  visuals  while  actually  testing  for  invalid  responding.  Freestanding  measures  can  be 
time-consuming,  a  critical  factor  in  neurocognitive  assessment  where  available  time  and 
patient  tolerance  are  constrained.  Examiners  often  only  have  time  to  administer  one 
freestanding  SVT,  if  any  at  all  (116).  If  only  one  invalid  responding  measure  is 
administered  at  a  single  time  point  in  an  evaluation,  the  measure  may  not  adequately 
characterize  the  individual’s  responding  behavior  throughout  the  evaluation  (36). 

Embedded  validity  indices  offer  many  advantages  over  freestanding  SVTs,  such 
as  increased  test  efficiency,  assessment  in  multiple  cognitive  domains  other  than 
“memory,”  less  vulnerability  to  coaching,  and  assessment  at  multiple  time  points  (234). 
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However,  embedded  validity  indices  are  generally  inferior  to  freestanding  SVTs  in  their 
individual  ability  to  accurately  identify  invalid  responding  (179).  Examiners  must  also 
consider  correlations  between  multiple  embedded  indices,  as  multiple  failures  on 
essentially  similar  indices  do  not  necessarily  provide  convergent  evidence  of  invalid 
responding  (224).  Boone  (36)  recommends  using  multiple,  modestly  correlated 
embedded  indices  in  conjunction  with  select  freestanding  SVTs  to  maximize  the 
likelihood  of  correctly  identifying  invalid  responding  behavior.  To  limit  the  effects  of 
shared  variance  among  embedded  indices,  it  may  be  useful  for  examiners  to  use 
embedded  indices  from  multiple  cognitive  domains  in  an  assessment. 

Recently,  Schutte  and  Axelrod  (234)  summarized  the  embedded  validity  research 
pertaining  to  mild  TBI,  presenting  sensitivities  and  specificities  for  varying  cut  scores  on 
embedded  indices  within  tests  of  attention/processing  speed,  motor  functioning, 
visuospatial  functioning,  executive  functioning,  visuospatial  memory,  and  verbal 
leaming/memory.  Not  surprisingly,  sensitivity  and  specificity  fluctuated  as  a  function  of 
the  cut  score,  where  lower  cut  scores  (i.e.,  lower  threshold  to  “fail”)  had  higher 
specificity  and  vice  versa.  Consistent  with  research  associating  invalid  responding  with 
slower  perfonnance  (14),  the  embedded  indices  with  the  best  combination  of  sensitivity 
and  specificity  were  most  commonly  derived  from  tests  of  reaction  time  and  reaction 
time  variability. 

Reaction  Time  and  Embedded  Validity  Indices  in  Continuous  Performance 
Tests 

Reaction  time  (i.e.,  response  time),  well-recognized  as  a  sensitive  metric  for 
detecting  brain  damage,  can  also  be  used  to  detect  malingering  (112).  Reaction  time  was 
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commonly  used  as  an  indicator  of  deception  in  the  early  20th  century  (91;  189). 

However,  the  practice  largely  fell  out  of  favor  in  the  early  1930s,  largely  replaced  by 
measures  of  autonomic  arousal  to  detect  deception  (39). 

Recently,  computerized  cognitive  testing  has  rejuvenated  the  use  of  reaction  time 
as  a  useful  measurement  of  invalid  responding  (113;  230;  271;  282).  In  surveying  the 
available  literature,  simple  choice  reaction  times  consistently  appear  to  be  slower  during 
invalid  versus  honest  responding  (39;  140),  suggesting  reaction  times  are  delayed  when 
planning  and  executing  an  invalid  response  (282).  Reaction  time  variability,  another 
common  cognitive  ability  measurement,  has  been  studied  in  patients  with  brain  damage 
(40),  HIV/AIDS  (73),  and  ADHD  (194).  Willison  and  Tombaugh  (282)  recently  reported 
that  greater  reaction  time  variability  can  detect  simulated  TBI,  since  the  formulating  and 
executing  of  simulation  strategies  during  testing  increases  response  variability. 

Omission  and  commission  errors  can  be  used  to  measure  distractibility  and 
impulsivity,  respectively  (55).  These  variables  are  commonly  measured  in  continuous 
performance  tests,  which  feature  multiple  trials  appearing  in  rapid  succession  over  a 
given  length  of  time.  Omission  errors  occur  when  a  subject  fails  to  respond  (e.g.,  presses 
a  button)  within  a  given  time  limit,  usually  the  length  of  the  trial.  Conversely, 
commission  errors  occur  when  a  subject  over-responds  (e.g.,  multiple  button  presses)  or 
incorrectly  responds  (e.g.,  presses  a  button  instead  of  inhibiting  a  button  press)  on  a  given 
trial.  Omission  and  commission  errors  have  recently  been  investigated  as  invalid 
responding  detection  metrics  (42;  48;  160;  191;  198). 

Popular  computer-based,  continuous  perfonnance  measures  of  attention,  such  as 
the  Test  of  Variables  of  Attention  (TOVA;  161)  and  Conners’  Continuous  Performance 
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Test-II  (CPT-II;  55)  measure  sustained  attention  and  concentration  using  multiple 
variables,  including  reaction  time,  reaction  time  variability,  omission  errors,  and 
commission  error  scores.  Many  of  the  variables  derived  from  continuous  perfonnance 
tests  have  been  found  to  be  useful  in  differentiating  valid  from  invalid  responding.  For 
example,  Leark  and  colleagues  (160)  administered  the  TOVA  to  a  sample  of  36 
undergraduate  volunteers,  who  either  took  the  test  under  a  “faking  bad”  instruction  set  or 
a  “normal  conditions”  instruction  set.  After  counterbalancing  for  order  of  instruction  set, 
the  authors  reported  that  the  group’s  “faking  bad”  responses  had  significantly  more 
omission  and  commission  errors,  slower  reaction  time  mean,  and  greater  reaction  time 
mean  variance  than  the  group’s  “normal  conditions”  responses.  Using  a  known-groups 
design  and  drawing  from  a  sample  of  52  neuropsychological  evaluation  referrals  (fifty  for 
mild  TBI  involved  in  personal  litigation,  one  for  fibromyalgia-related  disability,  and  one 
for  chronic  pain-related  disability),  Henry  (118)  also  reported  that  TOVA  omission  and 
commission  errors,  reaction  time,  and  reaction  time  variability  were  all  significantly 
greater  in  the  “probable  malingering”  group  than  the  “not  malingering”  group. 

Multiple  researchers  have  reported  that  omission  errors  on  the  CPT-II 
demonstrate  acceptable  classification  accuracy  for  invalid  responding  (42;  198).  Lange 
and  colleagues  (149)  reported  CPT-II  omissions,  commissions,  and  perseverations  may 
be  useful  to  rule  in  poor  effort  but  not  necessarily  rule  it  out.  Ord  and  colleagues  (198) 
reported  that  the  CPT-II’ s  Hit  Reaction  Time  Standard  Error  (i.e.,  reaction  time 
variability)  demonstrated  acceptable  classification  accuracy  for  invalid  responding. 

Common  dependent  variables  in  continuous  perfonnance  tests — reaction  time, 
reaction  time  variability,  omission  enors,  and  commission  enors — appear  to  serve  as 
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useful  embedded  validity  indices  capable  of  reliably  differentiating  invalid  responding 
from  valid  responding.  Newer  tests,  such  as  the  CTIP  (113),  also  demonstrate 
considerable  promise  as  a  freestanding  SVT  by  using  reaction  time  and  reaction  time 
variability  for  detecting  invalid  responding  (211).  Unlike  traditional  forced-choice  SVTs 
that  primarily  rely  on  digit,  letter,  or  word  recognition  and  memory,  continuous 
perfonnance  tests  can  assess  invalid  responding  across  a  variety  of  cognitive  domains, 
including  attention,  executive  functioning,  and  processing  speed  (33;  54).  The  next 
section  will  discuss  attentional  processes  in  greater  detail  to  provide  insight  into  why 
continuous  perfonnance  tests  may  be  uniquely  suited  for  assessing  invalid  responding. 

Attention  and  Oculomotor  Functioning 

Conceptually,  attention  serves  as  a  basic  set  of  mechanisms  that  facilitates  one’s 

awareness  of  the  world  and  the  voluntary  regulations  of  thoughts  and  emotions  (204). 
Neurological  disease  and  dysfunction  can  impair  attentional  processes,  making  it  a  prime 
target  for  brain  disorders  research.  Aspects  of  attention  can  be  manipulated  and 
controlled  experimentally,  providing  researchers  and  clinicians  with  a  window  into  the 
underlying  neuroanatomical  functioning  of  a  patient.  Furthermore,  attention  has  been 
described  as  a  cognitive  process  sensitive  enough  to  detect  impairement  from  TBI  (54). 

Vision  involves  a  continuous  engagement  and  disengagement  of  attention,  where 
individuals  fixate  their  attention  on  an  object,  then  disengage  their  attention  in  order  to 
fixate  on  a  new  object.  Oculomotor  functioning  has  been  said  to  blend  cognition  and 
perception,  where  eye  movements  represent  one’s  cognitions,  expectations,  and 
motivations  for  comprehension  (74).  Visually  guided  eye  movements  are  regulated  by 
central  visuomotor  structures,  the  afferent  visual  system,  and  the  efferent  oculomotor 
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system,  the  last  of  which  includes  the  retina,  supplementary  eye  fields,  superior 
colliculus,  lateral  geniculate  nucleus,  frontal  eye  fields,  prefrontal  cortex,  striate  cortex, 
parietal  cortex,  basal  ganglia,  and  the  brain  stem  (81).  These  oculomotor  pathways — 
particularly  in  the  frontal  eye  fields,  supplementary  eye  fields,  prefrontal  cortex,  and 
parietal  cortex — demonstrate  extensive  overlap  with  cognitive  processes  such  as 
attention,  working  memory,  and  learning,  suggesting  that  these  systems  are  functionally 
interrelated  (203).  Presumably,  if  continuous  perfonnance  tests  and  other  tests  of 
attention  can  be  used  to  detect  invalid  responding,  then  eye  movements  related  to 
attention  may  also  serve  as  tools  for  response  validity  assessment.  The  next  section  will 
describe  the  types  of  eye  movements  that  may  be  used  to  measure  attentional  processes. 

Fixations  and  Saccades 

Fixations  and  saccades  are  complimentary  components  of  eye  movements. 
Fixations  occur  when  a  person  focuses  on  a  specific  stimulus  and  stabilizes  his  or  her 
gaze  on  it.  As  such,  when  one  “looks”  at  a  given  object,  that  person  is  “fixating”  on  that 
object.  Fixations  are  controlled  by  both  voluntary  and  involuntary  fixation  mechanisms 
(107;  108).  Like  the  name  implies,  voluntary  fixations  occur  under  the  control  of  the 
individual  who  willfully  moves  his  or  her  eyes  onto  a  given  object;  these  fixations  are 
controlled  in  bilateral  cortical  fields  in  the  premotor  cortex  of  the  frontal  lobes  (107; 
108).  Involuntary  fixations,  on  the  other  hand,  “lock”  the  eyes  onto  an  object  once  it  has 
been  found,  and  are  controlled  by  secondary  visual  areas  in  the  occipital  cortex  (107; 
108).  Involuntary  and  voluntary  fixations  work  hand-in-hand  with  each  other.  As  a 
voluntary  fixation  ends,  an  involuntary  fixation  begins. 
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When  one  breaks  his  or  her  fixation  or  “lock”  on  a  given  point  of  gaze  to  fixate 
upon  a  new  object,  he  or  she  is  likely  making  a  saccadic  movement  towards  a  new  point 
of  gaze.  Saccades  are  quick,  jerky  eye  movements  that  occur  between  fixations  during 
the  search  for  visual  targets  (17;  139).  Regulated  by  the  superior  colliculus  (17), 
saccades  occur  very  rapidly,  lasting  20-50  ms  (268).  In  a  given  eye  movement — from 
saccadic  initiation  to  the  final,  involuntary  fixation — the  saccadic  movement  itself 
encompasses  only  10%  of  the  total  eye  movement  duration,  while  the  fixation  on  a  target 
encompasses  the  other  90%  (107;  108).  Saccadic  movements  are  ballistic  in  nature;  once 
initiated,  the  speed  or  direction  of  a  saccade  cannot  be  corrected  (139).  During  a  saccadic 
movement,  the  brain  automatically  blocks  visual  input  from  being  processed  (107;  108). 

Saccadic  Processes  and  Cognition 

From  a  cognitive  neuroscience  perspective,  saccadic  eye  movements  are 
influenced  by  conscious  (i.e.,  deliberate)  and  unconscious  (i.e.,  automatic)  responses  to 
internal  and  external  stimuli.  According  to  Fischer’s  “three-loop”  model  (78;  79;  81), 
three  processes  occur  before  a  saccade  is  made:  disengagement  of  visual  attention, 
decision  to  execute  a  saccade,  and  calculation  of  saccade  “metrics”  (e.g.,  direction, 
amplitude,  velocity)  needed  to  reach  the  target.  Several  factors  influence  these  saccadic 
processes. 

Before  a  saccade  can  be  generated,  attentional  disengagement  from  an  object  of 
focus  must  occur.  The  individual  or  the  object  may  facilitate  this  disengagement,  either 
from  the  individual  consciously  shifting  focus  away  from  an  object,  or  by  the  object 
disappearing  (thus  temporarily  leaving  the  individual  without  an  object  of  focus).  When 
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a  fixated-upon  object  disappears,  a  brief  “gap”  occurs  before  an  individual  shifts  his  or 
her  attentional  focus  elsewhere. 

Forced  visual  disengagement  experiments,  or  “gap  paradigms,”  are  commonly 
used  in  experiments  measuring  saccades  (68;  144).  Gap  paradigms  compare  saccadic 
perfonnance  between  “gap”  and  “overlap”  conditions  of  a  measure.  In  an  experimentally 
manipulated  “gap”  condition,  a  fixated-upon  object  disappears  for  a  brief  time  (usually 
around  200ms)  before  a  new  object  appears  in  the  participant’s  field  of  view.  These 
“gap”  conditions  appear  to  release  subjects’  fixations  on  objects  for  them,  freeing 
subjects  to  rapidly  redirect  their  attention  towards  a  new  object.  Conversely,  in  “overlap” 
conditions,  a  new  object  appears  before  the  fixated-upon  object  disappears,  forcing  the 
individual  to  “break”  the  fixation  to  generate  a  saccade  towards  a  new  object. 

Researchers  have  hypothesized  that  gaps  enable  disinhibition  of  saccadic 
movement  while  overlap  conditions  inhibit  new  fixations  (128).  As  such,  saccadic 
latencies  are  typically  shorter  (i.e.,  faster  or  smaller)  in  gap  conditions  and  longer  (i.e., 
slower  or  larger)  in  overlap  conditions  (80;  275).  This  “gap  effect”  is  believed  to  be 
moderated  by  attention  and  mediated  by  a  “fixation  release”  component  (128). 

Cues  are  sensory  stimuli  of  all  types — biological,  psychological,  and 
environmental — that  influence  the  decision  to  generate  a  saccade  and  the  execution  (i.e., 
calculation)  of  the  saccadic  movement.  Cues  that  predict  an  object’s  location,  also 
known  as  predictive  cues,  help  reduce  the  latency  of  saccades  made  towards  the  target 
(46).  These  predictive  cues  orient  an  individual  towards  an  area  of  focus.  Cues  that 
distract  or  incorrectly  predict  an  object’s  location,  on  the  other  hand,  increase  latencies  of 
saccades  towards  an  object  (273).  These  misleading  or  invalid  cues  force  one  to  inhibit 
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saccadic  responses  or  rapidly  identify  and  correct  a  saccadic  error.  The  relationship 
between  stimulus  cues  and  saccadic  latencies  suggests  a  functional,  cognitive  relationship 
between  attention,  attentional  networks,  and  saccadic  eye  movements  (68;  128;  204). 
Systems  designed  to  quantify  eye  movement  metrics  thus  appear  to  be  uniquely  suited  to 
measure  attentional  processing. 

Eye  Tracking  Research  of  Cognitive  Functioning 

Eye  movement  processes  like  fixations  and  saccades  fall  within  a  relatively 

narrow  range  of  perfonnance  metrics  for  most  individuals,  making  them  highly  reliable 
for  comparisons  between  groups  with  and  without  a  history  of  brain  injury  (74).  In  recent 
years,  several  eye  tracking  systems  have  been  developed  to  precisely  record  and  quantify 
these  eye  movements.  Using  high  speed  cameras  and  advanced  processing  equipment, 
eye  tracking  systems  can  measure  opticokinetic  activity  and  compute  data  for  one’s 
pupillometry,  fixation  location  and  duration,  and  saccadic  latency,  velocity,  and  accuracy 
(69).  Eye  tracking  systems  can  measure  multiple  components  of  fixations  and  saccades, 
enabling  comparisons  between  groups  of  interest. 

In  the  past  decade,  cognitive  neuroscientists  have  used  advancements  in  eye 
tracking  technology  and  novel  eye  tracking  techniques  to  study  attention,  response 
inhibition,  working  memory,  processing  speed,  and  executive  function  (16;  92;  128;  190; 
196;  203).  Several  studies  of  neurological  injuries  and  neurodegenerative  disorders 
support  the  idea  that  eye  movements  are  closely  related  to  brain  functioning  (190;  203; 
240).  Crawford  and  colleagues  (58)  used  eye  tracking  equipment  to  record  saccadic  eye 
movements,  saccadic  inhibitory  control,  and  saccadic  errors  metrics  (e.g.,  saccadic 
omissions,  commissions,  and  correction  latencies)  in  young  and  old  groups  of  delirium 
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patients  and  healthy  controls.  They  reported  the  most  reliable  oculomotor  index  of 
dementia  severity  was  the  number  of  error  correction  failures,  or  the  lack  of  corrective 
saccadic  responses  to  omission  or  commission  errors.  Eye  movement  abnonnalities  have 
also  been  reported  in  patients  with  schizophrenia  (235),  and  Parkinson’s  disease  (270). 
Furthennore,  a  rapidly  growing  body  of  evidence  suggests  that  eye  movements  and 
fixations  directly  correspond  to  attention  and  executive  functions,  two  cognitive 
processes  commonly  disrupted  by  TBI  (74;  117;  148). 

Eye  tracking  studies  have  found  oculomotor  deficits  in  brain  injured  individuals 
long  after  sustaining  the  injuries.  Most  self-reported  neurobehavioral  symptoms  diminish 
after  seven-to-ten  days  following  a  single,  uncomplicated  mild  TBI,  with  “full”  recovery 
normally  occurring  within  three  months  (171).  However,  poorer  oculomotor 
perfonnance  was  detected  in  groups  of  mild  TBI  patients  three-to-five  months  after  injury 
(117),  mild  and  moderate-to-severe  TBI  patients  six  months  after  injury  (147),  and  mild 
and  moderate-to-severe  TBI  patients  more  than  twelve  months  after  injury  (148).  As 
such,  oculomotor  metrics  appear  to  be  sensitive  to  neuronal  injury  long  after  “normal” 
recovery  time,  and  may  serve  as  useful  tools  for  long-tenn  evaluation  of  brain  injuries. 

The  Bethesda  Eye  &  Attention  Measure  (BEAM) 

In  2010,  this  author  and  his  academic  advisor  developed  the  Bethesda  Eye  & 

Attention  Measure  (BEAM),  a  novel,  computer-based  eye  tracking  tool  designed  to 
assess  cognitive  function  (18).  The  BEAM  was  originally  conceptualized  as  a  measure  to 
detect  cognitive  deficits  in  the  post-acute  stage  of  mild  TBI.  It  was  designed  as  a  12- 
minute,  continuous  performance  test  with  a  multiple  trial  format.  The  BEAM  utilizes  six 
pseudorandomly  presented  trial  types,  each  with  unique  visual  stimuli  (i.e.,  cues)  that 
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were  designed  to  elicit  specific  cognitive  processes  of  attention  and  executive  function. 
These  cues  include  white  arrows,  red  arrows,  and  diamonds  which  may  or  may  not 
predict  the  location  of  a  target  circle  appearance.  Gap  conditions  and  overlap  conditions 
are  also  interwoven  into  the  four  counterbalanced  blocks  of  trials.  Saccadic  and  manual 
(i.e.,  button  press)  infonnation  is  collected  on  each  trial.  Reaction  time  metrics  and 
omission  errors  are  measured  on  five  non-inhibition  trial  types,  and  commission  errors 
are  measured  on  a  sixth  inhibition  trial  type. 

A  feasibility  study  of  the  BEAM  using  1 1  subjects  without  a  history  of  head 
injury  found  the  BEAM  to  have  excellent  internal  consistency  for  manual  reaction  time 
(all  Cronbach’s  alpha  values  >  .97)  and  acceptable-to-excellent  internal  consistency  for 
saccadic  reaction  time  (all  Cronbach’s  alpha  values  >  .74;  overall  saccadic  reaction  time 
Cronbach’s  alpha  =  .94;  18).  Despite  the  small  sample  size,  the  BEAM  was  able  to  elicit 
gap,  alerting,  orienting,  and  executive  effects  (see  204)  with  large  effect  sizes.  The  trial 
design  accounted  for  79. 1%  of  the  variance  in  manual  reaction  time  and  74.8%  of  the 
variance  in  saccadic  reaction  time.  This  author  concluded  that  the  BEAM  may  be  a 
psychometrically  sound  tool  to  assess  attention,  executive  function,  and  processing  speed 
in  a  relatively  short  amount  of  time,  and  further  investigation  was  merited  (18). 

A  subsequent  analysis  of  BEAM  data  collected  from  a  follow-on  study  found 
saccadic  and  manual  reaction  time  to  be  significantly  correlated  with  neuropsychological 
measures  of  attention,  executive  function,  and  processing  speed  after  controlling  for  age, 
education,  and  gender  (19).  On  the  same  study,  manual  reaction  time  was  correlated  with 
self-report  measures  of  depression,  traumatic  stress,  and  combat  exposure,  but  saccadic 
reaction  time  was  not,  suggesting  BEAM  saccadic  reaction  time  may  be  resistant  to 
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common  psychological  confounds  found  in  neuropsychological  assessment  (19).  A 
separate  study  of  BEAM  data  found  that  saccadic  commission  errors  were  negatively 
associated  with  executive  functions  and  working  memory  after  controlling  for  age  and 
education  (201). 

While  originally  designed  to  identify  cognitive  deficits  associated  with  mild  TBI, 
the  BEAM  appears  to  have  potential  applications  in  many  other  contexts  requiring 
neurocognitive  assessment.  One  such  area  for  exploration  is  response  validity 
assessment.  The  BEAM  is  a  continuous  performance  task,  a  measure  that  presents  a 
large  number  of  trials  in  a  short  amount  of  time.  As  described  earlier,  continuous 
perfonnance  tests  have  demonstrated  utility  for  discriminating  between  groups  of  valid 
and  invalid  responders.  Reaction  time,  reaction  time  variability,  omission  errors,  and 
commission  errors,  each  identified  as  embedded  validity  indices  on  the  CPT-II  (55), 
TOVA  (161),  and  CTIP  (113),  are  calculated  in  BEAM  output  data  for  both  manual  and 
saccadic  responses. 

The  BEAM’S  oculomotor  assessment  capabilities  potentially  offer  more  sensitive 
response  validity  metrics  than  existing  continuous  performance  test  metrics.  As 
described  above,  oculomotor  functioning  and  the  BEAM’S  saccadic  reaction  time  metrics 
appear  to  be  resistant  to  confounding  effects  of  depression  and  intelligence  (19;  117; 

201).  The  BEAM  presents  a  unique  opportunity  to  explore  valid  and  invalid  perfonnance 
in  both  saccadic  and  manual  responding  metrics.  Manual  responding  can  be  compared 
with  existing  continuous  perfonnance  tests,  and  oculomotor  responding  can  be  used  to 
detennine  utility  above  and  beyond  the  manual  responding  metrics. 
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Scope  of  Project 

Neuropsychological  assessment  is  commonly  used  for  determining  impairment 
from  traumatic  brain  injury,  a  widely  prevalent  injury  in  civilian  and  military  contexts. 
Valid  symptom  report  and  test  performance  are  essential  prerequisites  for  the  accurate 
interpretation  of  neuropsychological  data.  Unfortunately,  base  rates  of  invalid 
responding  in  civilian  and  military  contexts  suggest  that  symptom  exaggeration  and 
underperformance  are  common  in  neuropsychological  assessment.  Many  freestanding 
and  embedded  validity  indicators  have  been  developed  and  derived  to  detect  invalid 
responding,  but  these  measures  are  limited  by  a  variety  of  factors  that  dilute  their 
classification  accuracy. 

This  dissertation  project  evaluated  a  novel  eye-tracking  tool,  the  BEAM,  as  a 
method  for  detecting  invalid  responding  in  neurocognitive  assessment.  This  project 
followed  Bianchini,  Greve,  and  Glynn’s  (28)  guidelines  for  symptom  and  performance 
validity  research  by  1)  utilizing  a  method  operationalizing  the  construct  of  interest  (e.g., 
noncompliance,  malingering,  invalid  performance),  2)  reporting  sensitivity,  specificity, 
and  predictive  power,  3)  prioritizing  specificity  over  sensitivity  when  determining  the 
overall  classification  rate  of  invalid  performance  detection  techniques,  and  4)  considering 
the  purity  of  the  criterion  groups  (valid  controls  vs.  invalid  responders)  when  estimating  a 
technique’s  classification  accuracy.  The  study  utilized  a  “combined  groups”  design  that 
incorporated  a  well-controlled  simulator  study  and  a  known-group  comparison.  The 
intent  of  the  project  was  to  determine  the  invalid  performance  classification  accuracy  of 
saccadic  and  manual  BEAM  metrics  with  the  goal  of  identifying  useful  embedded  indices 
that  may  be  used  to  detect  invalid  responding. 
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Specific  Aims  and  Hypotheses 

How  does  the  Bethesda  Eye  &  Attention  Measure  (BEAM)  perform  in 
discriminating  between  valid  and  invalid  responding  among  healthy  persons?  How  do 
the  BEAM’S  embedded  response  validity  metrics  compare  with  existing  response  validity 
tests  (RVTs)?  How  does  valid  and  invalid  responding  on  the  BEAM  compare  between 
groups  of  healthy  persons  and  persons  with  a  history  of  mild  traumatic  brain  injury? 

Three  specific  aims  of  this  project  were  proposed  to  answer  these  research  questions. 
Each  aim  is  supported  with  several  specific,  testable  hypotheses. 

Specific  Aim  1:  To  examine  relationships  between  invalid  responding  and 
performance  on  BEAM  metrics. 

The  first  aim  of  the  project  was  to  assess  the  relationship  of  BEAM  metrics  to 
valid  and  invalid  responding.  ROC  analyses  were  used  to  identify  BEAM  metrics  that 
best  discriminate  between  valid  and  invalid  responding.  Classification  accuracy  statistics 
were  determined  for  BEAM  metrics  with  significant  differences  between  groups. 
Subsequent  statistical  analyses  compared  those  metrics  between  groups  of  valid  and 
invalid  responders  without  a  history  of  TBI.  Optimal  cut  scores  were  identified  for 
significant  BEAM  metrics. 

Hypothesis  1A:  The  invalid  responding  group  will  demonstrate  significantly 
poorer  compliance  with  test  instructions  than  the  valid  responding  group.  To  test  this 
hypothesis,  one  variable  representing  the  number  of  trials  invalidated  from  incorrect 
initial  fixations  (i.e.,  not  looking  at  the  center  of  the  screen  as  instructed)  was  submitted 
to  ROC  analyses. 
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Hypothesis  IB:  The  invalid  responding  group  will  have  significantly  slower 
reaction  time  than  the  valid  responding  group.  To  test  this  hypothesis,  twelve  reaction 
time  variables  were  submitted  to  ROC  analyses.  These  variables  included  Saccadic 
Reaction  Time  (SacRT)  and  Manual  Reaction  Time  (ManRT)  for  the  overall  measure  and 
five  individual  trial  types. 

Hypothesis  1C:  The  invalid  responding  group  will  have  significantly  greater 
reaction  time  intra-individual  variability  than  the  valid  responding  group.  To  test  this 
hypothesis,  twelve  reaction  time  variability  metrics  were  submitted  to  ROC  analyses. 
These  variables  included  Saccadic  Reaction  Time  Intra-Individual  Variability  (SacRT- 
IIV)  and  Manual  Reaction  Time  Variability  (ManRT-IIV)  for  the  overall  measure  and 
five  individual  trial  types. 

Hypothesis  ID:  The  invalid  responding  group  will  have  significantly  more 
commission  errors  than  the  valid  responding  group.  Commission  errors  were  measured 
as  a  ratio  of  commission  errors  per  number  of  successfully  recorded  inhibition  trials.  To 
test  this  hypothesis,  two  variables — Saccadic  Commission  Error  Percentage  (SacCom%) 
and  Manual  Commission  Error  Percentage  (ManCom%) — were  submitted  to  ROC 
analyses. 

Hypothesis  IE:  The  invalid  responding  group  will  have  a  significantly  more 
omission  errors  than  the  valid  responding  group.  Omission  errors  were  measured  as  a 
ratio  of  omission  errors  per  number  of  successfully  recorded  non-inhibition  trials.  To  test 
this  hypothesis,  two  variables — Saccadic  Omission  Error  Percentage  (SacOm%)  and 
Manual  Omission  Error  Percentage  (ManOm%) — were  submitted  to  ROC  analyses. 
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Specific  Aim  2:  To  compare  the  invalid  responding  classification  accuracy  abilities 
of  BEAM  metrics  to  existing  RVTs. 

The  second  aim  of  the  project  was  to  compare  the  BEAM  metrics  that  have  been 
shown  to  differentiate  valid  from  invalid  responding  to  existing  response  validity  test 
metrics.  To  address  the  hypotheses  under  this  specific  aim,  ROC  analyses  were 
conducted  to  identify  freestanding  and  embedded  response  validity  test  metrics  that  best 
discriminated  between  the  valid  and  invalid  responding  groups.  Subsequent  statistical 
analyses  compared  those  metrics  between  the  experimental  groups.  Classification 
accuracy  statistics  were  detennined  for  freestanding  and  embedded  metrics  with 
significant  differences  between  groups.  Optimal  cut  scores  were  identified  for  significant 
freestanding  and  embedded  response  validity  metrics.  The  classification  abilities  of 
BEAM  metrics  were  then  compared  to  the  embedded  and  freestanding  response  validity 
test  metrics. 

Hypothesis  2A:  The  BEAM  will  provide  incremental  predictive  value  above  and 
beyond  the  classification  accuracy  of  embedded  response  validity  tests.  To  test  this 
hypothesis,  the  results  of  the  Trail  Making  Test  (TMT;  212),  Conners’  Continuous 
Performance  Test-Second  Edition  (CPT-II;  55),  and  the  Digit  Span  subtest  from  the 
Wechsler  Adult  Intelligence  Scale-Fourth  Edition  (WAIS-IV;  278)  were  submitted  to 
ROC  analyses.  The  classification  accuracy  of  TMT,  CPT-II,  and  the  Digit  Span  variables 
with  sufficient  AUC  were  compared  to  BEAM  metrics  using  logistic  regression. 

Hypothesis  2B:  The  BEAM  will  provide  incremental  predictive  value  above  and 
beyond  the  classification  accuracy  of freestanding  response  validity  tests.  To  test  this 
hypothesis,  the  results  of  the  Victoria  Symptom  Validity  Test  (VSVT;  243)  and  the 
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Medical  Symptom  Validity  Test  (MSVT;  94)  were  submitted  to  ROC  analyses.  The 
classification  accuracy  of  VSVT  and  the  MSVT  variables  with  sufficient  AUC  were 
compared  to  BEAM  metrics  using  logistic  regression. 

Hypothesis  2C:  The  BEAM  will  provide  incremental  predictive  value  above  and 
beyond  the  classification  accuracy  of  both  embedded  and  freestanding  response  validity 
tests.  All  variables  with  sufficient  AUC  from  embedded  indices,  freestanding  measures, 
and  the  BEAM  were  loaded  into  a  hierarchical  logistic  regression  model  to  determine  the 
relative  contribution  of  each  metric. 

Specific  Aim  3:  To  evaluate  BEAM  and  embedded  RVT  performance  between 
simulator  study  participants  and  valid  responders  with  a  history  of  mild  traumatic 
brain  injury. 

The  third  aim  of  the  project  was  to  compare  the  perfonnance  of  the  simulator 
study’s  valid  and  invalid  responding  groups  to  research  participants  with  a  history  of  mild 
TBI.  The  parent  study’s  TBI  cohort  was  screened  in  order  to  exclude  participants  with 
moderate-to-severe  TBI  and  participants  who  demonstrated  invalid  responding.  The 
remaining  group  of  valid  responders  with  a  history  of  mild  TBI  was  compared  to  the 
simulator  study  groups  in  order  to  identify  any  metrics  that  may  incorrectly  classify 
actual  clinical  group  members  as  invalid  responders.  Optimal  cut  scores  were  identified 
for  the  mild  TBI  clinical  group. 

Hypothesis  3A:  Of  the  previously  identified  optimal  BEAM  and  embedded  RVT 
variables,  there  will  be  no  significant  performance  differences  between  valid  responders 
with  and  without  a  history  of  mild  TBI.  To  test  this  hypothesis,  BEAM  and  embedded 
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RVT  metrics  were  submitted  to  between-group  comparisons  and  post-hoc  analyses. 
Classification  accuracy  statistics  were  calculated  with  the  clinical  group. 

Hypothesis  3B:  Of  the  previously  identified  optimal  BEAM  and  embedded  RVT 
variables,  there  will  be  significant  perfonnance  differences  between  invalid  responders 
and  valid  responders  with  a  history  of  mild  TBI.  To  test  this  hypothesis,  BEAM  and 
embedded  RVT  metrics  were  submitted  to  between-group  comparisons  and  post-hoc 
analyses.  Classification  accuracy  statistics  were  calculated  with  the  clinical  group. 
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CHAPTER  2:  Methods 


Study  Design 

This  dissertation  project’s  study  design  utilized  a  combined  groups  design  to 
maximize  internal  and  external  validity  of  its  results.  In  a  combined  groups  design, 
researchers  apply  results  from  a  simulator  study  to  a  group  or  groups  that  have  met  a 
given  criteria  for  classification  (i.e.,  known-groups;  218).  In  this  dissertation  project, 
results  from  a  prospective  simulator  study  were  compared  to  a  clinical  group  of  subjects 
with  a  history  of  mild  TBI  that  met  criteria  for  valid  responding. 

The  core  component  of  this  dissertation  project  was  a  prospective,  experimental 
simulator  study  that  compared  neurocognitive  and  oculomotor  perfonnance  between 
groups  of  healthy  persons9  with  and  without  an  experimental  manipulation  to  perform 
poorly.  Group  participation  was  randomly  assigned  and  blinded  to  examiners.  Between- 
group  comparisons  of  valid  and  invalid  responders  were  used  to  generate  classification 
accuracy  statistics  for  the  BEAM  and  other  neurocognitive  measures  that  could  be 
compared  to  previous  research. 

To  enhance  the  generalizability  of  the  simulator  study’s  results,  data  from  a  “real 
world”  TBI  sample  drawn  from  the  general  population  was  used  for  clinical  comparisons. 
This  TBI  data  were  collected  as  part  of  this  project’s  parent  study:  “Eye  Tracking 
Indicators  of  Neurocognitive  Status  after  Traumatic  Brain  Injury”  (Principal  Investigator: 
Mark  L.  Ettenhofer,  Ph.D.).  The  parent  study  is  a  correlational  study  designed  to  assess 
and  compare  cognitive  perfonnance  in  people  with  and  without  a  history  of  traumatic 
brain  injury  using  BEAM  and  neurocognitive  measures  as  dependent  variables.  The 

9  No  history  of  TBI  or  medical  conditions/medications  that  would  impact  cognitive  functioning. 
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parent  study  was  approved  by  the  Institutional  Review  Board  (IRB)  at  Unifonned 
Services  University  of  the  Health  Sciences  (USUHS;  see  Appendix  C:  Administrative 
Documents). 

Several  neurocognitive  measures  with  empirically  derived  embedded  response 
validity  indices  were  used  to  separate  the  TBI  cohort  into  known  groups  of  valid  and 
invalid  responders.  The  majority  of  the  subjects  in  the  TBI  cohort  had  a  history  of  mild 
traumatic  brain  injuries,  and  most  of  these  subjects  met  criteria  for  valid  responding. 
There  were  insufficient  numbers  of  invalid  responders  with  a  history  of  mild  TBI  or 
subjects  with  moderate-to-severe  TBI  to  power  analyses.  Given  the  available  data,  only 
subjects  with  a  history  of  mild  TBI  who  met  criteria  for  valid  responding  were  included 
in  the  “known”  clinical  comparison  group.  Results  from  the  prospective  simulator  study 
were  compared  to  the  parent  study  data. 

Participants 

As  stated  above,  the  dissertation  project  evaluated  three  groups  of  responders:  an 
invalid  responding,  biased  group  of  responders  without  a  history  of  TBI  (the  “BR” 
group);  a  valid  responding,  unbiased  group  of  responders  without  a  history  of  TBI  (the 
“UR”  group);  and  a  valid,  unbiased  group  of  responders  with  a  history  of  mild  TBI  (the 
“UR-mTBI”  group).  The  following  sections  describe  the  inclusion  and  exclusion  criteria 
for  each  of  the  three  groups.  Of  note,  the  UR-mTBI  group’s  inclusion  and  exclusion 
criteria  are  equivalent  to  the  ongoing  parent  study. 
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BR  and  UR  Groups 

The  biased  responding  (BR)  and  unbiased  responding  (UR)  groups  consisted  of 
persons  recruited  exclusively  for  this  dissertation  project.  Participants  for  the  BR  and  UR 
groups  were  recruited  using  flyers,  internet  advertisements,  and  hand-outs.  Participants 
were  compensated  $30  for  their  involvement  in  the  study  unless  they  were  active  duty 
U.S.  military  or  federal  employees.  The  following  inclusion  criteria  were  used  to  screen 
potential  participants:  must  be  18  years  or  older,  must  have  fluency  or  literacy  in  English 
(per  self-report),  must  be  willing  and  able  to  provide  informed  consent,  and  must  have 
obtained  written  permission  from  supervisor  and/or  brigade  commander  if  they  are  a 
federal  civilian  or  active  duty  U.S.  military.  Participants  were  not  allowed  to  participate 
in  this  study  if  they  have  ever  sustained  a  traumatic  brain  injury  of  any  severity 
throughout  their  lifetime,  including  any  head  injuries  that  involved  an  alteration  of 
consciousness  (AOC).  Participants  were  also  excluded  if  they  had  a  medical  condition 
(e.g.,  thyroid  disorder,  sickle  cell  anemia)  or  were  actively  taking  medication  that  could 
impair  their  cognitive  abilities,  if  they  had  any  visual  impairment  that  could  not  be 
corrected  by  glasses/contacts,  or  if  they  had  motor  impairment  or  amputation  of  one  or 
both  upper  extremities.  Any  participant  in  the  UR  group  that  exceeded  cut  score 
thresholds  on  one  or  more  freestanding  RVTs  and/or  two  or  more  embedded  RVTs  (see 
Appendix  A:  Table  2)  was  excluded  from  all  analyses  (i.e.,  he  or  she  was  not  analyzed  as 
a  BR  group  member). 

UR-mTBI  Group 

The  unbiased  responders  with  self-reported  history  of  mild  traumatic  brain  injury 
(UR-mTBI)  group  consisted  of  participants  in  this  project’s  parent  study  who  reported  of 
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history  of  at  least  one  mild  TBI.  In  accordance  with  DoD/VA  clinical  practice  guidelines 
(66),  mild  TBIs  were  defined  as  events  that  involved  a  sudden  movement  or  a  blow  to  the 
head  the  resulted  in  a  loss  of  consciousness  (LOC)  ranging  from  0  to  30  minutes  or  loss 
of  memory  (i.e.  post-traumatic  amnesia;  PTA)  less  than  or  equal  to  24  hours.  While  the 
American  Congress  of  Rehabilitation  Medicine  (ACRM)  also  qualifies  alterations  of 
consciousness  (AOC)  as  “mild  TBIs”  (142),  any  subject  reporting  AOC  without  LOC  or 
PTA  was  not  included  in  the  parent  study’s  (or  this  project’s)  mild  TBI  group.  Consistent 
with  other  studies  using  TBI  samples  (159;  260),  head  injury  infonnation  from  the  parent 
study  sample  could  not  be  verified  by  medical  record.  To  obtain  head  injury  details, 
examiners  used  a  semi- structured  interview  that  obtained  detailed  information  about 
injury  characteristics,  mechanism  of  injury,  and  injury  sequelae.  Follow-up  questions 
were  asked  as  needed  to  provide  a  comprehensive  understanding  of  the  injury  or  injuries. 
A  week  after  the  participant  completed  the  assessment,  a  team  consisting  of  two  licensed 
psychologists  with  post-doctoral  fellowship  training  in  clinical  neuropsychology  and 
three-to-five  clinical  psychology  doctoral  students  carefully  considered  the  accuracy  and 
context  of  the  self-reported  injury  characteristics  and  classified  the  individual  based  on 
the  person’s  most  severe  head  injury.  The  potential  classifications  included  “no  TBI,” 
“possible  mild  TBI  (AOC  only),”  “mild  TBI,”  “moderate  TBI,”  and  “severe  TBI.” 

Participants  in  the  UR-mTBI  groups  were  recruited  using  flyers,  internet 
advertisements,  hand-outs,  and  newspaper  advertisements.  All  UR-mTBI  group 
participants  were  told  prior  to  the  assessment  that  they  would  be  compensated  $40  for 
their  involvement  in  the  study  unless  they  were  ineligible  for  compensation  (i.e.,  active 
duty  military  or  federal  employees).  The  following  inclusion  criteria  were  applied  to  all 
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parent  study  participants:  must  be  18  years  or  older,  must  have  a  history  of  one  or  more 
head  injuries  with  a  loss  of  consciousness  or  memory,  must  have  fluency  or  literacy  in 
English  (per  self-report),  must  be  willing  and  able  to  provide  infonned  consent,  and  must 
have  obtained  written  permission  from  supervisor  and/or  brigade  commander  if  they  were 
a  federal  civilian  or  U.S.  military.  Participants  were  excluded  from  the  parent  study  if 
they  had  a  medical  condition  (e.g.,  thyroid  disorder,  sickle  cell  anemia)  or  were  actively 
taking  medication  that  could  impair  their  cognitive  abilities,  if  they  had  any  visual 
impairment  that  could  not  be  corrected  by  glasses/contacts,  or  if  they  had  motor 
impairment  or  amputation  of  one  or  both  upper  extremities.  Parent  study  participants 
were  excluded  from  this  study’s  UR-mTBI  group  if  their  head  injuries  were  in  the 
moderate-to-severe  range  (LOC  >  30  minutes  or  PTA  >  1  day)  or  if  their  head  injuries 
did  not  involve  a  loss  of  consciousness  or  post-traumatic  amnesia  (i.e.,  possible  mild  TBI 
[AOC  only]).  Lastly,  parent  study  participants  were  excluded  from  the  UR-mTBI  group 
if  they  met  criteria  for  invalid  responding,  which  was  defined  as  exceeding  one  or  more 
of  the  eight  empirically  derived  embedded  RVT  cutoff  score  thresholds  of  90% 
specificity  or  greater  (see  Appendix  A:  Table  2). 

Setting  and  Equipment 
Setting 

The  room  that  used  for  testing  was  located  in  Dr.  Mark  Ettenhofer’s  research 
laboratory  (Room  B1032  on  the  USUHS  campus).  During  computer-based  tasks,  the 
participant  sat  at  a  desk  with  a  computer  monitor  and  eye  tracker,  and  the  examiner  sat 
five  feet  behind  the  participant  at  a  desk  facing  90  degrees  from  where  the  participant 
was  facing.  During  non-computer-based  measures,  including  the  semi-structured 
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interview,  the  participant  and  examiner  faced  each  other  and  used  the  examiner’s  desk  as 
the  assessment  surface. 

Computers 

Two  computers  were  used  during  the  study:  a  “stimulus  computer”  used  by 
participants  and  a  “control  computer”  used  by  examiners  (see  Appendix  C:  Pictures). 

The  stimulus  computer  was  used  to  present  computerized  measures  to  the  subject.  The 
stimulus  computer  was  a  Dell  Precision  T1500  with  an  Intel  Core  i7  860  CPU,  2.80  GHz 
processing  speed.  Subjects  viewed  computerized  measures  on  a  15”  Asus  VW193  flat- 
screen  monitor  set  to  1440  x  900  pixel  resolution.  Examiners  used  the  control  computer 
to  run  programs  on  the  stimulus  computer  and  record  eye-tracking  data.  The  control 
computer  was  a  custom-built  PC  with  a  Pentium  Dual-Core  E5400  CPU,  2.70  GHz 
processing  speed. 

Eye-Tracking  Device 

Eye  tracking  was  performed  using  an  Applied  Science  Laboratories  (ASL)  D6 
High-Speed  (HS)  Desktop  Eye  Tracker  (see  Appendix  C:  Pictures).  The  primary 
components  of  the  eye  tracker  included  a  high  speed  camera  to  record  visual  information 
from  the  eye  and  an  infrared  illuminator  to  provide  a  corneal  reflection  from  which  eye 
gaze  vectors  can  be  computed.  This  infrared  illuminator  operated  within  the  spectral 
range  of  between  760  and  1400  nanometers  at  intensities  of  <0.5  mW/cm2  to  0.7 
mW/cm2,  well  below  the  maximum  safe  chronic  ocular  exposure  value  of  10  mW/cm2. 
This  system  used  non-coherent  illumination;  there  were  no  lasers  in  the  system.  The 
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desktop-mounted  D6-HS  system  did  not  require  chinrests  or  other  head  stabilizers;  it  was 
chosen  for  its  inconspicuous  design  and  for  its  enhanced  participant  comfort. 

Response  Pad 

A  Cedrus  RB-530  response  pad  was  used  to  record  manual  (i.e.,  button  press) 
response  time  with  1  millisecond  time  resolution  (see  Appendix  C:  Pictures).  The  Cedrus 
response  pad  was  chosen  to  allow  a  higher  level  of  time  resolution  relative  to  buttons  on  a 
standard  computer  keyboard. 

Software 

ASL  Results  Version  1.0  was  used  to  analyze  eye  tracking  data.  E-Prime  2.0 
software,  a  suite  of  applications  used  in  computerized  experiment  design,  data  collection, 
and  analysis,  was  used  to  run  the  BEAM.  E-Prime  2.0  software  enabled  paradigm 
developers  to  use  signal  codes  called  “XDATs”  to  mark  events  that  occur  throughout 
computer-based  measures.  By  marking  certain  events  (e.g.,  trial  begins,  target  appears, 
button  is  pressed,  etc.),  developers  could  synchronize  participant  responses  with 
paradigm  activity.  SPSS  Version  20  was  used  for  statistical  analyses. 

Data  Acquisition  and  Post-Processing 
Data  Acquisition  Procedure 

The  D6-HS  system  used  a  two-computer  interface.  Participants  completed 
computerized  assessments  at  the  stimulus  computer,  where  the  eye  tracker  and  response 
pad  recorded  oculomotor  activity  and  manual  responses  (i.e.,  button  presses), 
respectively.  Examiners  sat  at  a  control  computer  with  live-feed  video  monitors  and  an 
ASL  data  processing  unit.  Cables  connected  the  two  computers,  synchronizing 
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participant  responses,  eye  movements,  and  assessment  events,  enabling  examiners  to 
calibrate  participants  and  monitor  their  gaze  in  real  time. 

Participants  sat  with  their  head  approximately  24”  from  and  level  with  the  center 
of  the  stimulus  computer  monitor.  The  D6-HS  was  positioned  directly  below  the 
monitor,  facing  the  participant.  The  examiner  sat  behind  the  participant  at  the  control 
computer.  The  examiner  oriented  the  D6-HS  camera  onto  the  participant’s  right  eye. 
Next,  the  eye  tracker  was  calibrated  by  having  the  participant  gaze  sequentially  at  a  series 
of  dots  presented  on  the  stimulus  display.  The  calibration  process  took  approximately  2 
minutes  to  complete.  Data  were  then  be  collected  at  120Hz  by  recording  eye  tracking 
data  synchronized  with  event  markers  related  to  the  presentation  of  stimuli. 

A  parallel  cable  connecting  the  stimulus  computer  and  control  computer  enabled 
BEAM  events  (e.g.,  trial  beginning,  stimulus  appearing,  etc.)  to  be  synchronized  in  real 
time  with  manual  and  oculomotor  data  collection.  In  a  given  trial,  the  stimulus  computer 
sent  XDAT  codes  that  signaled  when  trials  began,  when  visual  stimuli  were  presented  on 
screen,  when  buttons  were  pressed,  and  when  trials  ended.  Because  every  data  segment 
collected  during  the  BEAM  uses  a  specific  XDAT  code,  ASL  software  was  able  to 
perform  trial-by-trial  analysis  after  the  participant  completed  the  BEAM.  The  data  output 
enabled  examiners  to  observe  BEAM  activity  during  a  given  trial,  identify  where  a 
person  was  looking  throughout  a  given  trial,  and  collect  button  press  data. 

Data  Post-Processing 

Eye  tracking  data  noted  above  was  first  be  filtered  to  remove  blinks,  out-of-range 
values,  and  other  potential  sources  of  error.  Fixations  and  saccades  were  then  computed 
with  the  ASL  eye  tracking  analysis  software  using  established  algorithms.  Custom 
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scoring  software  was  then  used  to  perform  data  acquisition  checks  to  enhance  confidence 
in  the  obtained  data,  identifying  trials  in  which  momentary  visual  signal  loss  prevented 
reliable  reaction  time  or  inhibition  error  calculations.  After  screening  the  data  for 
unsuccessfully  recorded  trials,  the  custom  scoring  software  derived  task-  and  trial- 
dependent  variables  from  eye  gaze  and  motor  responses.  These  values  were  then 
collapsed  across  multiple  task  trials  in  order  to  obtain  summary  metrics  relevant  to  each 
task  (e.g.,  median  saccadic  reaction  times  for  specific  trial  types,  commission  error 
percentages).  At  least  10  successfully  recorded  trials  (out  of  32)  were  required  to  obtain 
a  summary  median  reaction  time  or  inhibition  error  metric  for  each  trial  type. These 
derived  summary  metrics  were  then  used  in  primary  analyses  of  interest,  similar  to  the 
summary  scores  of  traditional  cognitive  tests. 

Independent  Variables 

There  are  two  independent  variables  in  this  dissertation  project,  each  with  two 
levels.  The  first  independent  variable  is  “invalid  responding  bias”  (yes  or  no).  The 
second  independent  variable  is  “head  injury”  (yes  or  no).  This  project  included  three 
groups:  unbiased  responders  with  a  history  of  mild  traumatic  brain  injury  (UR-mTBI), 
unbiased  responders  without  a  history  of  head  injury  (UR),  and  biased  responders  without 
a  history  of  head  injury  (BR). 

The  head  injury  independent  variable  was  manipulated  quasi-experimentally 
through  recruitment,  semi-structured  interview,  and  clinical  panel  consensus.  Invalid 
responding  bias  was  manipulated  in  two  different  ways,  depending  on  the  study  group. 
For  the  parent  study’s  mild  TBI  cohort,  existing  embedded  RVT  cut  scores  (see 
Appendix  A:  Table  2)  were  used  to  exclude  potential  invalid  responders.  As  such, 
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invalid  responding  bias  in  the  TBI  cohort  was  manipulated  quasi-experimentally.  By 
contrast,  invalid  responding  bias  in  the  non-TBI  groups  (i.e.,  groups  from  the  simulator 
study)  were  manipulated  experimentally  through  random  group  assignment  and  group- 
specific  scenarios  presented  to  participants  prior  to  their  assessment.  Specifically, 
simulator  study  participants  were  randomly  assigned  to  either  BR  or  UR  groups  and 
given  a  scenario  that  asks  them  to  perform  as  if  they  sustained  a  head  injury  (i.e.,  biased 
responding;  BR  group)  or  to  perfonn  their  best  (i.e.,  unbiased  responding;  UR  group).  To 
enhance  internal  validity  of  the  experimental  manipulation,  examiners  were  blinded  to 
group  assignment  throughout  the  assessment,  scoring,  and  data  entry  of  simulator  study 
participants. 

Dependent  Variables  (Measures) 

This  study  compared  performance  between  groups  of  valid  and  invalid  responders 
on  the  Bethesda  Eye  &  Attention  Measure  (BEAM),  neurocognitive  tasks  with  embedded 
response  validity  indices,  and  freestanding  response  validity  tests  (RVTs).  A  well- 
controlled  simulator  study  incorporating  several  RVTs  into  its  design  allowed  the  relative 
sensitivity  of  the  embedded  and  freestanding  RVTs  to  be  calculated  and  compared  to 
each  other  (96).  Furthermore,  by  incorporating  neurocognitive  measures  into  the  design, 
this  study  was  able  to  assess  the  ability  of  various  RVTs  to  predict  whether  or  not 
neurocognitive  test  scores  from  clinical  comparison  groups  were  accurate  (96).  Sample 
characterization  measures  were  administered  to  detennine  group  demographics  and 
identify  group  differences  in  age,  gender,  years  of  education,  race/ethnicity,  premorbid 
intelligence,  and  knowledge  of  TBI  sequelae. 
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Sample  Characterization  Measures 
Baseline  Interview 

Two  baseline  interviews  were  used  for  this  study.  The  first  baseline  interview  was 
used  solely  for  the  simulator  study  of  this  dissertation  project.  The  simulator  study 
Baseline  Interview  obtained  demographic  infonnation  (e.g.,  age,  race/ethnicity),  military 
history  (if  applicable),  educational  background,  languages  spoken,  employment/disability 
status,  medical  history,  medications,  and  alcohol/nicotine/caffeine  use.  Please  see 
Appendix  C:  Simulator  Study  Baseline  Interview.  The  second  baseline  interview  used  in 
this  dissertation  project  was  drawn  from  the  parent  study’s  archival  data;  this  Parent 
Study  TBI  Cohort  Baseline  Interview  was  used  with  participants  with  a  self-reported 
history  of  TBI.  The  parent  study  Baseline  Interview  obtained  similar  information  from 
simulator  study  Baseline  Interview  plus  information  related  to  head  injuries,  activities  of 
daily  living,  and  treatment  history.  Please  see  Appendix  C:  Parent  Study  TBI  Cohort 
Baseline  Interview. 

Feedback  Interview 

For  the  simulator  study,  a  lab  member  other  than  the  examiner  administered  a 
post-assessment  interview  to  assess  qualitative  and  quantitative  infonnation  about  test¬ 
taking  strategies,  perceived  perfonnance,  and  experiences  of  study  participation.  The 
feedback  interview  also  assessed  the  examiner’s  beliefs  towards  the  participant’s  group 
membership.  The  interview  also  served  as  a  manipulation  check  on  the  primary 
independent  variable  of  group  assignment.  Please  see  Appendix  C:  Simulator  Study 
Feedback  Interview. 
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Wechsler  Test  of  Adult  Reading  ( WTAR ) 

The  WTAR  estimates  premorbid  intellectual  functioning  (124).  Participants  are 

given  a  list  of  50  words  and  asked  to  pronounce  the  words  as  best  they  could.  The 

measure  ends  when  participants  reach  the  50  word  limit  or  when  the  participants 

incorrectly  pronounce  12  words  in  a  row.  The  WTAR’s  internal  consistency  (.90-. 97) 

and  test-retest  reliability  (.90-. 94)  are  excellent.  The  WTAR  also  positively  correlates 

with  Wechsler  Adult  Intelligence  Scale-Third  Edition  (WAIS-III;  276)  Full  Scale  IQ 

(FSIQ;  .63-. 80)  and  the  Verbal  Comprehension  Index  (VCI;  .61-. 80;  124). 

Head  Injury  Knowledge  Scale  (HIKS) 

The  HIKS  assesses  the  level  of  misconceptions  of  the  effects  of  brain  injury  via 
18  true/false  responses,  with  the  number  and  type  (“minimization”  or 
“overgeneralization”)  of  inaccurate  responses  reflecting  the  magnitude  and  type  of 
misconceptions  (197).  The  HIKS  was  used  to  identify  any  differences  in  head  injury 
sequelae  knowledge  between  groups  that  may  bias  invalid  responding  approaches  (71). 
For  example,  if  the  BR  group  knew  significantly  more  about  cognitive  and  behavioral 
sequelae  of  TBI  than  the  UR  group,  the  BR  group  may  demonstrate  more  sophisticated 
invalid  responding  than  what  would  be  expected  from  the  general  population. 

The  HIKS  assesses  misconceptions  across  several  domains  impacted  by  traumatic 
brain  injury,  including  physical,  sensory-perceptual,  cognitive,  and  behavioral  domains. 
Participants  with  (Version  A)  and  without  (Version  B)  a  history  of  head  injury  are  asked 
to  indicate  whether  they  think  the  changes  referred  to  in  each  item  are  true  (“often  or 
most  of  the  time”),  or  false  (“never  or  rarely”).  The  HIKS  contains  an  8-item 
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“overgeneralization”  subscale  and  a  6-item  “minimization”  subscale.  HIKS  Version  B 
was  used  in  this  dissertation  project’s  simulator  study  (see  Appendix  C:  HIKS  B). 

Oculomotor  Measure 

Bethesda  Eye  &  Attention  Measure  (BEAM) 

The  BEAM  is  a  computerized,  continuous  performance  task  that  assesses  saccadic 

(i.e.,  visual)  and  manual  (i.e.,  button  press)  responses  to  stimuli  presented  on  a  computer 

monitor.  The  BEAM  consists  of  one  block  of  24  practice  trials  and  four  blocks  of  48 

trials.  Each  block  is  counterbalanced,  with  equal  numbers  of  trial  types  (Directional  Cue 

[DC],  Nondirectional  Cue  [NDC],  Misdirectional  Cue  [MDC],  Uncued  with  Gap  [UC-G], 

Uncued  with  Overlap  [UC],  and  Directional  Cue-Red  Arrow  [DCR]),  target  locations 

(up,  down,  left,  and  right),  and  arrow  cue  locations  (up,  down,  left,  and  right).  For  DC, 

NDC,  MDC,  UC-G,  and  UC  trials,  participants  are  asked  to  look  at  a  fixation  cross  at  the 

center  of  the  screen  until  a  target  circle  appears  above,  below,  left,  or  right  of  the  screen’s 

center.  Participants  are  asked  to  look  at  the  target  circle  and  press  a  button  as  soon  as  a 

target  circle  appears.  Saccadic  and  manual  reaction  time,  reaction  time  intra-individual 

variability,  and  omission  errors  are  calculated  for  the  five  “non-inhibition”  trial  types  and 

the  overall  measure.  Reaction  time  (RT)  is  represented  by  a  median  score,  and  reaction 

time  intra-individual  variability  (RT-IIV)  is  represented  by  the  standard  deviation  of  the 

reaction  times.  Overall  saccadic  and  manual  RT  is  calculated  by  averaging  the  median 

reaction  times  across  the  non-inhibition  trial  types.  Overall  saccadic  and  manual  RT-IIV 

is  calculated  by  averaging  the  RT-IIV  values  from  the  non-inhibition  trial  types. 

Omission  errors  occur  when  a  participant  fails  to  look  at  a  target  circle  or  press  the  button 

by  the  time  a  new  trial  begins  (1000ms).  On  DCR  (i.e.,  inhibition)  trials,  participants  are 
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told  not  to  look  at  the  target  circle  or  press  a  button  when  the  target  circle  appears. 

Unlike  the  five  other  trial  types,  which  use  only  white  arrow  or  diamond  cues,  DCR  trials 
use  red  arrow  cues.  Commission  errors  are  calculated  when  participants  look  at  the  target 
circle  or  press  the  button  during  DCR  trials.  Preliminary  analyses  of  BEAM 
psychometric  properties,  convergent  validity,  and  divergent  validity  are  discussed  above 
in  the  literature  review. 

Of  note,  built-in  data  acquisition  validity  indices  enhance  the  confidence  in  the 
obtained  reaction  time,  reaction  time  intra-individual  variability,  omission  errors,  and 
commission  errors  variables.  Before  any  BEAM  metrics  are  calculated,  a  custom-made 
scoring  program  checks  for  lost  or  missing  saccadic  data  segments.  A  trial  is  discarded  if 
the  eye  tracker  loses  its  lock  on  a  person’s  pupil  or  comeal  reflection  on  more  than  20% 
of  the  segments  recorded  after  a  target  circle  appears.  If  the  trial  is  not  discarded  for  data 
loss,  it  is  said  to  have  been  “successfully  recorded.”  Next,  the  BEAM  checks  to  see  if 
participants  were  following  instructions  on  a  trial.  Trials  are  discarded  if  a  person  was 
looking  outside  the  center  of  screen  when  a  target  circle  appears  (i.e.,  “Invalid  Initial 
Fixations”).  If  a  trial  was  successfully  recorded  and  the  initial  fixation  at  time  of  target 
circle  onset  was  in  the  center  of  the  screen,  the  scoring  program  would  calculate  BEAM 
metrics. 

Cognitive  Performance  Tasks  with  Embedded  Response  Validity  Indices 
Conners  ’  Continuous  Performance  Test,  Second  Edition  (CPT-II) 

The  CPT-II  is  a  computerized  vigilance  test  that  measures  attention  problems 

(55).  The  CPT-II  requires  examinees  to  press  the  space  bar  as  quickly  as  possible 
whenever  a  target  (i.e.,  any  letter  other  than  “X”)  appears  on  the  computer  screen,  and  to 
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inhibit  this  response  when  the  letter  “X”  appears  on  the  screen.  Ninety  percent  of  letters 
presented  are  targets.  Each  letter  is  presented  for  250ms,  with  varying  interstimulus 
intervals  (ISIs)  of  one,  two,  or  four  seconds  between  letters.  Full  test  administration 
includes  a  one-minute  practice  block  and  six  long  blocks  of  trials,  with  each  long  block 
containing  three  sub-blocks  of  20  trials.  Overall,  the  measure  takes  approximately  14 
minutes  to  administer. 

The  CPT-II  generates  12  indices  of  responding,  including  Hit  Reaction  Time  (Hit 
RT),  Hit  Reaction  Time  Standard  Error  (Hit  RT  SE),  Omissions,  Commissions, 
Variability  of  Standard  Error,  Hit  Reaction  Time  Block  Change  (Hit  RT  Block  Change), 
Hit  Standard  Error  Block  Change  (Hit  SE  Block  Change),  Hit  Reaction  Time 
Interstimulus  Interval  Change  (Hit  RT  ISI  Change),  and  Hit  Standard  Error  Interstimulus 
Interval  Change  (Hit  SE  ISI  Change),  Detectability  (<:/’) l0,  Response  Style  (P),  and 
Perseverations  (55).  Eight  indices  measure  inattention:  Omissions,  Commissions,  Hit 
RT,  Hit  RT  SE,  Hit  RT  ISI  Change,  Hit  SE  ISI  Change,  Variability,  and  Detectability. 
The  Commission  Index  and  Hit  RT  also  measure  impulsivity,  along  with  Perseverations. 
Hit  RT  Block  Change  and  Hit  SE  Block  Change  measure  vigilance  and  alertness  (55; 
207). 

Each  index  is  designed  to  measure  attention  uniquely  (55;  207).  Hit  RT  measures 
the  average  speed  of  correct  responses  for  the  entire  test,  and  Hit  RT  SE  measures 
response  speed  erraticness,  with  higher  scores  suggesting  inconsistent  responding. 
Omission  and  Commission  Indices  identify  failures  to  respond  to  targets  and  responses  to 
non-targets,  respectively.  Variability  measures  reaction  time  variability  across  18 
segments  of  the  test  in  relation  to  overall  Hit  RT  SE.  Hit  RT  Block  Change  describes 

10  The  Detectability  index  is  also  referred  to  as  the  Attentiveness  index 
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changes  in  reaction  time  across  the  duration  of  the  test,  with  higher  scores  suggesting  a 
slowing  of  reaction  time.  Hit  SE  Block  Change  measures  changes  in  response 
consistency  over  the  course  of  the  test,  with  higher  scores  suggesting  a  loss  of 
consistency.  Hit  RT  ISI  Change  measures  changes  in  reaction  time  at  different  ISIs  (i.e., 
one,  two,  or  four  seconds),  and  Hit  SE  ISI  Change  describes  the  change  in  reaction  time 
consistency  across  different  ISIs.  Detectability  measures  the  examinee’s  ability  to 
distinguish  a  target  from  a  non-target.  Response  Style  describes  an  examinee’s 
responding  trends,  with  higher  scores  suggesting  cautious,  accurate  responding  and  lower 
scores  suggesting  attempts  to  respond  to  all  targets  in  spite  of  accuracy.  Lastly, 
Perseverations  indicate  the  number  of  reaction  times  less  than  100ms,  reaction  times  that 
suggest  the  examinee  is  anticipating  a  stimulus  rather  than  reacting  to  one. 

Chen  and  colleagues  (53)  reported  that  Omissions,  Commissions,  Hit  RT,  Hit  RT 
SE,  and  Variability  display  acceptable-to-excellent  test-retest  reliability,  ranging  from 
.70-. 90.  Using  a  nonnative  population,  the  test-retest  reliability  ranged  from  .55-. 84  for 
the  same  five  measures  (55).  While  the  CPT-II  has  previously  demonstrated  sensitivity 
to  mild  TBI  in  the  chronic  phase  of  recovery  (147),  CPT-II  performance  does  not  appear 
to  significantly  differ  between  TBI  severity  (i.e.  mild,  moderate,  or  severe)  in  civilian  and 
military  samples  (149). 

According  to  the  CPT-II  manual,  Response  Style,  Omissions,  and  Perseverations 
can  be  used  to  detect  invalid  responding  (55;  207).  Response  Style  T-scores  below  40  or 
greater  than  60  suggest  overly  impulsive  or  overly  cautious  responding,  respectively. 
Extremely  high  T-scores  (  T>  100)  on  Omission  and  Perseverations  suggest 
misunderstanding  of  instructions  and  inaccurate  results.  Recently,  several  researchers 
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have  evaluated  the  CPT-II  using  known-groups  of  valid  vs.  invalid  responders  (42;  149; 
198).  After  submitting  all  CPT-II  variables  to  receiver  operator  characteristics  (ROC) 
curves,  Ord  and  colleagues  (198)  reported  Omissions  (area  under  the  curve  [AUC]=0.77, 
95%  CL  .65-. 89)  and  Hit  RT  SE  (AUC=0.77,  95%  CL  .65-.90)  produced  the  best 
classification  accuracy  scores  for  probable  or  definite  malingered  neurocognitive 
dysfunction  (MND;  244)  in  a  sample  of  88  all-severity  TBI  cases  with  high  external 
incentives.  Using  Ord  et  al.’s  (2010)  data,  Schutte  and  Axelrod  (234)  reported  >19  raw 
Omissions  rendered  specificity  of  90%  and  sensitivity  of  41%,  and  raw  Hit  RT  SE  values 
>13  rendered  specificity  of  90%  and  sensitivity  of  52%. 

Busse  and  Whiteside  (42)  examined  413  consecutively  referred 
neuropsychological  evaluations,  and  also  reported  CPT-II  Omissions  had  acceptable 
invalid  responding  classification  accuracy  (AUC=0.76).  Using  a  cut  score  of  >12  raw 
omissions,  the  authors  reported  specificity  of  88%  and  sensitivity  of  52%  (42).  In  a 
separate  study  of  158  deployed  U.S.  Service  Members  with  a  history  of  deployment- 
related  mild  and  severe  TBI,  Lange  and  colleagues  (149)  reported  that  Omissions  (.69  < 
AUC  <  .75),  Commissions  (.76  <  AUC  <  .79,  and  Perseverations  (.70  <  AUC  <  .79) 11 
demonstrated  the  best  invalid  responding  classification  accuracy  among  the  CPT-II 
variables.  When  comparing  the  groups  of  mild  TBI  participants  who  either  passed  or 
failed  effort  testing,  the  authors  reported  >11  raw  omissions  had  specificity  of  91%  and 
sensitivity  of  31%,  >21  raw  commissions  had  specificity  of  86%  and  sensitivity  of  45%, 
and  >1  raw  perseveration  had  specificity  of  93%  and  sensitivity  of  43%  (149).  Similar 
results  were  obtained  when  comparing  groups  of  mild  TBI  participants  who  failed  effort 

11  Ranges  include  point  AUC  results  for  both  T-scores  and  raw  scores  among  mild  TBI-fail  vs. 
mild  TBI-pass  comparisons  and  mild  TBI-fail  vs.  severe  TBI-pass  comparisons. 


81 


testing  with  severe  TBI  participants  who  passed  effort  testing;  >11  raw  omissions  had 
specificity  of  93%  and  sensitivity  of  31%,  >2 1  raw  commissions  had  specificity  of  93% 
and  sensitivity  of  45%,  and  >1  raw  perseveration  had  specificity  of  90%  and  sensitivity 
of  43%  (149).  Overall,  embedded  response  validity  indices  in  the  CPT-II  appear  to 
demonstrate  some  utility  towards  detecting  invalid  responding,  although  the  indices  may 
better  be  used  to  “rule  in”  invalid  responding  rather  than  ruling  it  out  (42;  149;  198). 

WAIS-IV  Digit  Span  Subtest 

The  Digit  Span  subtest  of  the  Wechsler  Adult  Intelligence  Scale-Fourth  Edition 
(WAIS-IV;  278)  measures  attention  and  working  memory,  and  consists  of  three  separate 
components:  Digit  Span  Forward,  Digit  Span  Backward,  and  Digit  Span  Sequencing.  In 
Digit  Span  Forward,  participants  repeat  numbers  that  are  spoken  to  them  (i.e.,  the  correct 
response  to  “1 -2-3-4”  is  “1 -2-3-4”).  In  Digit  Span  Backwards,  participants  repeat  the 
numbers  that  are  spoken  to  them  in  the  reverse  order  (i.e.,  the  correct  response  to  “1-2-3- 
4”  is  “4-3-2- 1”).  Lastly,  Digit  Span  Sequencing  requires  participants  to  order  the 
numbers  that  are  spoken  to  them  from  lowest  to  highest  (i.e.,  the  correct  response  to  “3-2- 
4-1”  is  “1 -2-3-4”).  On  each  component  of  the  Digit  Span  subtest,  participants  gradually 
proceed  with  longer  and  longer  digit  spans  until  they  incorrectly  respond  to  two  digit 
sequences  that  span  the  same  length.  The  Digit  Span  subtest  demonstrates  excellent 
internal  consistency  (.93)  and  good  test-retest  reliability  (.82;  278). 

The  Digit  Span  subtest  has  spawned  numerous  studies  examining  embedded 
response  validity  indices  (234;  252;  255).  One  of  the  most  researched  embedded 
response  validity  tests  (RVTs)  in  Digit  Span  is  Reliable  Digit  Span  (RDS;  101;  pp.  219- 
220),  which  is  “calculated  by  summing  the  longest  string  of  digits  repeated  without  error 
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over  two  trials  under  both  forward  and  backward  conditions.”  For  example,  a  participant 
who  correctly  responds  on  both  trials  with  four  digits  forward,  correctly  responds  on  both 
trials  with  three  digits  backward,  but  incorrectly  responds  to  one  or  both  trials  beyond 
that  point  would  earn  an  RDS  score  of  seven.  Larrabee  (153)  found  RDS  scores  <8  have 
specificity  of  94%  and  sensitivity  of  50%.  Babikian  and  colleagues  (14)  reported  RDS 
scores  <7  have  specificity  of  93%  and  sensitivity  of  45%.  Recently,  Schroeder  and 
colleagues  (2012)  conducted  a  systematic  review  and  cross-validation  study  of  RDS,  and 
concluded  that  RDS  can  be  used  effectively  in  many  clinical  samples,  with  cutoff  scores 
<7  having  global  specificity  and  sensitivity  rates  of  96%  and  30%,  respectively  . 

The  age-corrected  scaled  score  (ACSS)  on  the  Digit  Span  subtest  has  also  been 
researched  as  an  embedded  response  validity  index.  In  addition  to  their  RDS  findings, 
Babikian  and  colleagues  (14)  found  ACSSs  <6  have  specificity  of  93%  and  sensitivity  of 
42%.  By  comparison,  Axelrod  and  colleagues  (12)  reported  ACSSs  <6  have  specificity 
of  97%  and  sensitivity  of  36%. 

A  recent  meta-analysis  of  24  studies  using  RDS  or  ACSS  to  detect  invalid 
responding  found  both  indices  effectively  discriminated  between  valid  and  invalid 
responders,  with  an  average  RDS  Cohen’s  d  effect  size  of  1.34  (95%  CI\  1.18-1.50)  and 
an  average  ACSS  effect  size  of  1.08  (95%  Cl:  1.01-1.50;  137).  The  same  study  found 
both  indices  demonstrated  strong  overall  specificity  (RDS  M=  86.1%;  ACSS  M  — 

86.5%)  and  good  sensitivity  (RDS  M=  63.3%;  ACSS  M=  59.7%),  with  no  significant 
classification  accuracy  differences  between  RDS  and  ACSS  (137).  A  recent  study 
evaluating  RDS  and  ACSS  in  a  pediatric  sample  (ages  8-16)  with  the  Wechsler 
Intelligence  Scale  for  Children-4th  Edition  (WISC;  277)  reported  ACSS  cut  scores  <6  had 

12  Global  specificity  and  sensitivity  rates  were  calculated  using  weighted  averages. 
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specificity  of  96%  and  sensitivity  of  51%,  and  RDS  cut  scores  <7  had  specificity  of  92% 
and  sensitivity  of  51%  (145).  Additionally,  depression  of  all  severities  and  subtypes  does 
not  impact  RDS  or  ACSS  in  the  Digit  Span  subtest  (90). 

Most  of  the  available  studies  examining  RDS  and  ACSS  have  used  Digit  Span 
subtests  from  WAIS-IV  predecessors  that  only  included  Digit  Span  Forward  and  Digit 
Span  Backward  components  (137;  233).  The  WAIS-IV,  however,  has  an  additional 
sequencing  component  to  its  Digit  Span  subtest  (278),  prompting  researchers  to  consider 
a  revised  RDS  index  that  includes  all  three  Digit  Span  subtests.  As  cited  in  Young  et  al. 
(284),  Spencer  and  colleagues  were  the  first  to  study  the  Reliable  Digit  Span-Revised 
(RDS-R),  which  at  cutoff  scores  <12  demonstrated  specificity  of  89%  and  sensitivity  of 
59%.  Seeking  to  further  examine  the  classification  accuracy  of  the  RDS-R  index  on  the 
WAIS-IV  Digit  Span  sub  test,  Young  and  colleagues  (284)  found  RDS-R  scores  <1 1 
demonstrated  specificity  of  89%  and  sensitivity  of  32%,  and  scores  <12  rendered 
specificity  of  78%  and  sensitivity  of  48% 13.  Reese,  Suhr,  and  Riddle  (210)  reported 
RDS-R  (which  they  called  “Enhanced  RDS”)  values  <12  rendered  specificity  of  94%  in 
their  head-injured  sample  and  59%  sensitivity.  Reese  and  colleagues  (210)  also  studied 
Alternative  RDS  (A-RDS),  which  was  calculated  by  summing  reliable  digit  forward  and 
reliable  digit  sequencing;  they  reported  ARDS  values  <10  had  specificity  of  87%  in  their 
head  injured  sample  and  78%  sensitivity. 

Taken  together,  the  Digit  Span  subtest  appears  to  have  a  wealth  of  research 
suggesting  its  embedded  response  validity  indices  (e.g.,  RDS,  ACSS)  can  be  useful  for 
detecting  invalid  responding.  Additional  indices  particular  to  the  WAIS-IV  (i.e.,  ARDS, 

13  These  specificity  and  sensitivity  rates  are  poorer  than  the  classification  accuracy  statistics  cited 
in  the  Spencer  et  al.  study. 
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RDS-R)  demonstrate  potential  utility  towards  identifying  invalid  responding.  However, 
more  evidence  is  needed  before  using  RDS-R  or  ARDS  instead  of  RDS  or  ACSS. 

Trail  Making  Test  (TMT) 

The  TMT  is  a  graphomotor  test  consisting  of  two  components,  Part  A  and  Part  B 
(212).  In  TMT  A,  participants  draw  lines  to  connect  numbered  circles  in  order;  the  task 
largely  depends  on  the  participant’s  psychomotor  speed  and  visual  search  abilities.  In 
TMT  B,  participants  draw  lines  to  connect  circles  with  alternating  numbers  and  letters; 
this  task  places  additional  demands  on  the  participant’s  working  memory,  cognitive 
flexibility,  and  executive  functioning.  The  score  on  each  part  of  the  TMT  is  determined 
by  the  time  required  to  complete  each  trial.  The  TMT  displays  sufficient  test-retest 
reliability  on  both  Parts  A  &  B  (.79  and  .89,  respectively;  67),  and  a  moderate -to- 
sufficient  construct  validity  (.36-. 93;  70).  Recently,  Tsirka  and  colleagues  (267)  reported 
differences  between  a  control  group  and  a  mild  TBI  group  on  the  TMT. 

The  TMT  was  one  of  the  first  measures  to  be  evaluated  for  embedded  response 
validity  indices  (89;  115;  265).  Error  rates  (225),  completion  times  (134),  and 
completion  time  ratios  (89)  are  some  of  the  most  commonly  studied  embedded  indices  on 
the  TMT.  According  to  Suhr  and  Barrash  (252),  decades  of  embedded  response  validity 
research  using  the  TMT  have  produced  equivocal  results.  Iverson  and  colleagues  (134) 
found  that  TMT  A  completion  times  >62  seconds  had  100%  specificity  but  only  17% 
sensitivity  to  invalid  responding;  the  authors  also  reported  TMT  B  completion  times 
>199  seconds  or  more  had  100%  specificity  but  only  7%  sensitivity.  Recently,  Busse  and 
Whiteside  (42)  reported  that  TMT  B  was  able  to  distinguish  between  biased  (i.e.,  invalid) 
and  unbiased  (i.e.,  valid)  responders  (AUC=.75),  with  TMT  B  completion  times  >1 19 
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seconds  rendering  specificity  of  85%  and  sensitivity  of  61%.  Drawing  from  a  sample  of 
76  consecutive  mixed  acquired  brain  injury  patients  being  evaluated  for  outpatient  brain 
injury  rehabilitation,  Powell,  Locke,  and  Smigielski  (205)  reported  TMT  A  completion 
times  greater  than  47  seconds  had  specificity  of  83%  and  sensitivity  of  72%.  Powell  and 
colleagues  (205)  also  reported  TMT  B  completion  times  >124  seconds  had  a  specificity 
of  8 1%  and  sensitivity  of  50%,  In  summary,  slower  completion  times  on  the  TMT 
appear  to  be  associated  with  invalid  responding,  but  embedded  TMT  validity  indices  do 
not  by  themselves  appear  sensitive  enough  to  detect  invalid  responding  on  their  own 
(252;  255).  As  such,  the  TMT  may  provide  useful  supplemental  data  for  this  dissertation 
project. 

Freestanding  Response  Validity  Tests 
Victoria  Symptom  Validity  Test  (VSVT) 

The  VSVT  is  a  computerized,  freestanding  response  validity  test  that  uses  a 

forced-choice,  digit  recognition  paradigm  to  assess  the  possible  feigning  or  exaggeration 

of  cognitive  impairments  (243).  On  forty-eight  trials  (three  blocks  of  sixteen  trials), 

participants  are  shown  a  five-digit  sequence,  and  then  are  asked  to  choose  between  two 

options:  1)  the  correct  five-digit  number,  or  2)  a  foil  (i.e.,  a  similar  but  different  digit 

sequence).  The  items  are  categorized  as  either  “easy”  or  “difficult,”  depending  on  the 

similarity  of  the  foil  to  the  correct  five-digit  number.  The  test  measures  the  Total  Items 

Correct  score,  which  includes  the  type  and  number  of  items  answered  correctly,  Response 

Latency,  and  Right-Left  Preference  scores.  Like  other  forced-choice  measures,  the 

VSVT  is  interpreted  based  on  the  comparison  between  the  actual  score  and  what  is 
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expected  to  occur  on  chance  alone.  The  assessment  takes  approximately  18  to  25 
minutes  to  administer  (243;  261). 

In  a  recent  meta-analysis  of  freestanding  RVTs,  Sollman  and  Berry  (248)  reported 
the  combined  effect  size  of  the  VSVT  Hard  index  in  differentiating  valid  and  invalid 
responding  groups  was  d=2.11  (95%  CI:  2.32-3.22),  significantly  higher  than  the  Word 
Memory  Test  (WMT;  93),  Test  of  Memory  Malingering  (TOMM;  263),  the  Letter 
Memory  Test  (LMT;  130),  and  the  Medical  Symptom  Validity  Test  (MSVT;  94). 
Furthermore,  Sollman  and  Berry  (248)  also  reported  a  VSVT  Difficult  Items  Correct 
cutoff  score  of  15  or  below  produced  an  average  specificity  of  95.5%  (95%  CI\  76.4- 
100%)  and  average  sensitivity  of  81.5%  (95%  CL  75.1-87.9%).  Additionally,  VSVT 
failure  rates  have  been  consistently  shown  to  be  higher  in  compensation-seeking  samples 
than  clinical  populations  (103).  Available  data  suggest  the  VSVT  is  relatively  unaffected 
by  psychosis,  as  well  as  depression  of  all  severities  and  subtypes  (90).  The  VSVT 
appears  to  be  a  highly  sensitive  and  specific  freestanding  RVT. 

Medical  Symptom  Validity  Test  (MSVT) 

The  MSVT  is  a  computerized,  forced-choice  response  validity  test  used  to 

detennine  cognitive  effort  and  the  possible  feigning  or  exaggeration  of  symptoms  (94). 
During  the  MSVT,  a  list  of  word  pairs  (i.e.,  “skipping”  and  “rope”)  is  presented  twice  on 
the  computer  screen.  The  participant  is  then  asked  to  choose  the  correct  word  from  a  pair 
of  words,  one  being  the  target  and  one  being  the  foil.  After  a  10-minute  delay,  the 
participant  performs  the  forced-choice  task  again,  and  then  completes  the  paired 
associates  and  free  recall  subtests  (94;  126). 
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In  Sollman  and  Berry’s  (248)  meta-analysis  of  freestanding  RVTs,  the  MSVT 
pass/fail  index  had  a  combined  effect  size  ( d)  of  0.94  (95%  CL  0.70-1.19)  when 
differentiating  valid  and  invalid  responding  groups14.  The  authors  (248)  also  reported 
that  the  MSVT  pass/fail  index  demonstrated  good-to-excellent  classification  accuracy  for 
invalid  responding,  producing  an  average  specificity  of  91.3%  (95%  CI\  64.1-100%)  and 
average  sensitivity  of  70.0%  (95%  CL  13.1-100.0%).  The  MSVT’s  comparative  ease-of- 
use  and  classification  accuracy  make  it  a  useful  supplemental  freestanding  RVT  for  this 
project. 

Procedure 

As  described  earlier,  this  dissertation  project  incorporated  a  prospective  simulator 
study  and  data  collected  from  this  dissertation  project’s  parent  study.  The  simulator 
study  included  two  groups  of  persons  without  a  history  of  traumatic  brain  injury:  a  group 
biased  to  perform  as  if  they  sustained  a  head  injury  (biased  responding;  BR)  and  a  control 
group  asked  to  perform  their  best  (unbiased  responding;  UR).  The  parent  study’s  data 
provided  a  clinical  comparison  group  of  persons  with  self-reported  history  of  mild  TBI 
who  did  not  meet  criteria  for  invalid  responding  (unbiased  responding  with  mild  TBI; 
UR-mTBI).  Research  staff  was  divided  into  two  groups,  study  coordinators  and 
examiners.  Study  coordinators  were  responsible  for  assigning  participants  to  groups, 
collecting  pre-test  data,  and  conducting  post-test  interviews.  Examiners  were  responsible 
for  administering  and  scoring  the  neurocognitive  battery,  and  they  were  blinded  to  group 
assignment.  The  following  section  will  describe  the  procedure  for  the  prospective, 
simulator  study,  and  will  also  describe  the  procedure  used  in  the  parent  study. 

14  Surprisingly,  this  large  effect  size  was  significantly  lower  than  the  effect  sizes  found  for  the 
VSVT,  WMT,  TOMM,  and  LMT. 
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Simulator  Study:  BR  and  UR  Groups 

Potential  participants  were  screened  via  a  phone  interview  in  which  head  injury 

history,  neurological  illness  history,  and  demographic  infonnation  were  obtained  by  a  lab 
member.  If  the  participant  met  inclusion  criteria  and  did  not  meet  exclusion  criteria,  the 
participant  was  scheduled  to  come  to  the  lab  and  complete  the  simulator  study 
neurocognitive  assessment  battery.  After  the  phone  interview — but  before  the  participant 
arrived  for  assessment — a  study  coordinator  assigned  the  participant  a  study 
identification  number.  Study  identification  numbers  began  at  “1”  and  increased 
sequentially  as  additional  participants  were  scheduled.  Using  a  random  group  assignment 
plan  created  from  a  randomly  permuted  block  assignment  generator  program  (60),  the 
study  coordinator  then  assigned  the  participant  to  either  the  biased  responding  (BR)  or 
unbiased  responding  (UR)  group.  Participants  were  not  told  of  their  group  assignment  in 
advance  of  assessment  date  in  order  to  mitigate  potentially  confounding  effects  of  test 
preparation  or  coaching  (241). 

Once  the  participant  arrived  for  testing,  a  study  coordinator  obtained  their 
informed  consent.  If  the  participant  agreed  to  participate,  a  study  coordinator 
administered  the  baseline  interview,  WTAR,  and  HIKS  Version  B 15.  A  study  coordinator 
then  presented  a  specific  group  assignment  script  (see  Appendix  C:  Simulator  Study 
Group  Assignment  Scripts)  to  the  participant  and  asked  the  participant  not  to  reveal 
group  membership  to  the  examiner.  The  group  assignment  script  was  adapted  from 
previous  studies  of  invalid  responding  (7 1 ;  25 1 ;  257;  279),  and  was  designed  to  ask  both 
BR  and  UR  group  members  to  act  as  if  they  were  involved  in  a  remote  vehicular 
accident.  While  both  scripts  stated  that  the  participants  do  not  feel  any  lingering  effects 

15  HIKS  Version  B  is  designed  for  persons  who  have  not  sustained  a  TBI,  and  was  given  to  all 
simulator  study  participants  regardless  of  group  assignment. 
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from  the  accident,  the  biased  responding  script  asked  participants  to  exaggerate  cognitive 
problems  in  order  to  get  money  from  an  insurance  company.  Furthennore,  the  biased 
responding  scenario  warned  participants  to  fake  believably16.  By  contrast,  the  unbiased 
responding  scenario  asked  participants  to  perform  their  best.  After  the  group  scenarios 
were  presented  to  the  participants,  a  study  coordinator  addressed  the  participant’s 
questions,  comments,  or  concerns,  if  necessary. 

When  ready,  the  participant  was  introduced  to  his  or  her  examiner  and  completed 
a  1.5  hour  neurocognitive  assessment  battery.  Examiners  consisted  of  clinical 
psychology  doctoral  students  who  have  completed  graduate  courses  and  lab-internal 
training  on  clinical  assessment.  All  examiners’  testing  competence  and  protocol 
adherence  were  verified  by  Dr.  Ettenhofer  and  this  author  prior  to  assessing  any  study 
participants.  The  assessment  proceeded  in  the  following  order:  1)  BEAM,  2)  VSVT,  3) 
Digit  Span,  4)  CPT-II,  5)  MSVT,  6)  TMT  A  &  B,  7)  King-Devick  Test17,  and  8) 
Neurobehavioral  Symptom  Inventory18.  Once  the  neurocognitive  assessment  was 
completed,  the  examiner  thanked  the  participant  for  taking  part  in  the  study  and  left  the 
room.  A  study  coordinator  who  knew  the  participant’s  assigned  group  would  then 
administer  a  group-assignment-specific  feedback  interview  (see  Appendix  C:  Simulator 
Study  Feedback  Interview)  and  debrief  script  (see  Appendix  C:  Simulator  Study  Debrief 
Scripts)  to  the  participant. 


16  In  their  meta-analysis  of  38  studies  of  invalid  responding,  Sollman  and  Berry  (2011)  reported 
that  warnings  to  fake  believably  significantly  increased  the  effect  size  of  freestanding  RVT  score 
differences  between  valid  and  invalid  responding  groups  of  healthy  simulators. 

17  The  King-Devick  Test  was  added  for  secondary  analyses  not  included  in  this  project’s  aims. 

18  The  Neurobehavioral  Symptom  Inventory  was  added  for  secondary  analyses  not  included  in  this 
project’s  aims. 
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Parent  Study:  UR-mTBI  Group 

The  UR-mTBI  group  data  were  collected  from  this  dissertation  project’s  parent 
study.  Participants  who  met  criteria  for  inclusion  in  the  TBI  cohort  were  scheduled  for 
testing  with  a  lab  examiner.  Examiners  consisted  of  clinical  psychology  doctoral 
students  who  completed  graduate  courses  and  lab-internal  training  on  clinical  assessment. 
All  examiners’  testing  competence  and  protocol  adherence  were  verified  by  Dr. 
Ettenhofer  prior  to  assessing  any  study  participants.  Participants  that  were  eligible  for 
$40  compensation  were  made  aware  that  the  compensation  was  fixed  prior  to  their  arrival 
for  testing. 

On  the  day  of  testing,  the  examiners  administered  a  semi-structured  interview  that 
asked,  among  other  things,  about  the  participant’s  history  of  brain  injury,  to  include  most 
recent/most  severe  injury  information,  mechanism  of  injury,  loss  of  consciousness  length, 
and  posttraumatic  amnesia  length,  among  others.  Current  level  of  fatigue, 
alcohol/nicotine/caffeine  consumption  within  previous  12  hours,  and  medication 
information  was  also  collected.  After  the  semi-structured  interview,  the  UR-mTBI  group 
participants  were  administered  the  BEAM  and  a  comprehensive  neurocognitive  battery 
that  measured  domains  of  attention,  executive  function,  memory,  processing  speed,  and 
psychomotor  ability.  The  participants  were  asked  to  perform  their  best  throughout  the 
battery,  and  all  attempts  were  made  by  examiners  to  eliminate  sources  of  response  bias. 

Since  the  archival  UR-mTBI  group  data  were  drawn  from  a  parent  study  with  an 
established  assessment  battery,  not  all  of  the  measures  from  the  prospective  simulator 
study  could  be  compared  to  the  UR-mTBI  group.  Specifically,  there  were  no 
freestanding  response  validity  tests  to  compare  between  BR,  UR,  and  UR-mTBI  groups. 
However,  there  were  several  measures  with  empirically  validated  embedded  response 
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validity  indices  that  were  used  to  identify  “valid”  and  “invalid”  responders  within  the 
parent  study’s  TBI  cohort.  Of  the  neurocognitive  assessments  given  to  the  BR  and  UR 
groups,  the  UR-mTBI  groups  had  group  comparison  data  for  the  following:  1)  Baseline 
interview,  2)  BEAM,  3)  WTAR,  4)  Digit  Span,  5)  CPT-II,  6)  TMT,  and  7)  NSI19. 

Data  Analysis 
General  Analytic  Strategy 

Data  from  the  simulator  study  were  analyzed  to  address  Specific  Aims  1  and  2. 
Data  from  the  parent  study  were  compared  with  simulator  study  data  to  address  Specific 
Aim  3.  First,  descriptive  statistics  were  calculated  for  the  two  simulator  study  groups: 
biased  responders  without  a  history  of  brain  injury  (BR)  and  unbiased  responders  without 
a  history  of  brain  injury  (UR).  Demographics  (e.g.,  age,  sex,  race/ethnicity,  years  of 
education,  etc.),  estimated  premorbid  intelligence  (i.e.,  WTAR),  and  knowledge  of  head 
injuries  (i.e.,  HIKS)  were  compared  using  chi  square  analyses  or  /-tests  to  identify  any 
significant  demographic  differences  between  the  BR  and  UR  groups. 

To  address  Specific  Aims  1  and  2,  receiver  operating  characteristics  (ROC) 
analyses  and  logistic  regressions  were  performed  to  identify  BEAM,  embedded  RVT,  and 
freestanding  RVT  variables  with  the  greatest  potential  to  differentiate  between  groups  of 
biased  and  unbiased  responders  (64;  110;  127;  168;  172).  Variables  with  area  under  the 
curve  (AUC)  greater  than  or  equal  to  0.7  (acceptable  classification  accuracy)  and  p  values 
less  than  or  equal  to  .05  were  identified.  Due  to  the  large  number  of  variables  meeting 
this  criteria,  only  variables  with  AUC  greater  than  or  equal  to  0.9  (i.e.,  outstanding 
classification  accuracy)  were  submitted  to  subsequent  analyses.  Shapiro-Wilk  and 

19NSI  data  were  considered  secondary  to  this  dissertation  project’s  aims  and  were  collected  for 
future  studies. 


92 


Levene’s  tests  were  performed  to  test  for  normality  and  homogeneity  of  variance, 
respectively.  Independent-samples  /-tests  and  Mann- Whitney  U  tests  were  then 
conducted  to  identify  group  differences  and  effect  sizes.  Classification  accuracy 
statistics — sensitivity,  specificity,  hit  rate,  positive  predictive  value,  negative  predictive 
value,  and  likelihood  ratios — were  calculated  for  cutoff  scores  that  approximated  90% 
specificity  or  higher.  Stepwise  logistic  regression  analyses  were  conducted  on  variables 
with  outstanding  classification  accuracy  in  order  to  detennine  the  best  and  most 
representative  variables  among  the  BEAM,  embedded  RVTs,  and  freestanding  RVTs. 
Hierarchical  logistic  regressions  were  conducted  on  these  representative  variables  to 
determine  incremental  predictive  value  of  the  BEAM  above  and  beyond  embedded  and 
freestanding  RVTs. 

To  address  Specific  Aim  3,  the  variables  that  demonstrated  outstanding 
classification  accuracy  were  compared  to  archival  data  of  unbiased  responders  with  a 
history  of  mild  traumatic  brain  injury  (UR-mTBI).  Shapiro-Wilk  and  Levene’s  tests  were 
performed  to  test  for  normality  and  homogeneity  of  variance,  respectively.  One-way 
ANOVA  and  Kruskal-Wallis  tests  were  performed  to  identify  omnibus  group  differences 
among  the  BR,  UR,  and  UR-mTBI  groups.  Post-hoc  Tukey  HSD  and  Mann-Whitney  U 
tests  were  performed  to  identify  group  differences  and  effect  sizes  between  the  three 
groups.  Sensitivity  (SN)  and  specificity  (SP)  were  calculated  for  UR-mTBI  cutoff  scores 
that  approximated  90%  specificity  or  higher. 

Control  Variables 

Age,  gender,  and  years  of  education  often  influence  nonnative  values  in 
neurocognitive  assessment  (114).  Variables  obtained  from  the  WTAR,  TMT,  CPT-II, 
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and  Digit  Span  can  be  corrected  for  one  or  more  of  these  demographic  variables  using 
nonnative  data  (55;  1 14;  278).  However,  BEAM  variables  are  not  corrected  for 
demographic  variables,  and  cannot  be  meaningfully  compared  to  demographically- 
adjusted  scores  from  other  neurocognitive  tests.  As  such,  uncorrected  or  “raw”  values 
were  used  for  all  primary  analyses. 

Data  Analytic  Strategy  for  Aims  and  Hypotheses 

Specific  Aim  1 

To  support  Specific  Aim  1  of  this  dissertation  project,  BEAM  data  were  analyzed 
to  determine  which  variables  demonstrated  the  best  ability  to  differentiate  valid  from 
invalid  responding.  Since  the  BEAM  is  a  novel  test  of  cognitive  functioning,  little  was 
currently  known  about  its  ability  to  detect  invalid  responding.  As  such,  twenty-nine 
BEAM  variables  (e.g.,  saccadic,  manual,  and  data  validity  metrics)  were  submitted  to 
ROC  analyses.  Twenty-five  BEAM  variables  met  the  initial  criteria  of  AUC  >  0.7  and  p 
<  .05.  To  reduce  the  risk  of  Type  I  error  when  identifying  statistically  significant 
variables  (167;  202),  only  variables  with  AUC  greater  than  or  equal  to  0.9  and  p  values 
less  than  0.05  were  selected  for  additional  analyses.  It  should  be  noted  that  AUC 
estimates  from  ROC  curves  are  similar  for  a  wide  range  of  nonnal  and  non-nonnal 
distributions  (109),  so  the  default  parametric  ROC  analyses  available  in  SPSS  version  20 
were  conducted.  Classification  accuracy  tables  were  prepared.  Cutoff  scores  that 
approximated  90%  specificity  or  higher  were  calculated  for  all  BEAM  variables  that 
demonstrated  outstanding  classification  accuracy. 

Based  on  results  of  several  studies  of  feigned  neurocognitive  perfonnance  (42; 
129;  138;  188;  282),  the  likelihood  of  obtaining  non-nonnal  kurtosis  and  negative  skew 
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among  the  BEAM  variables  was  high.  Additionally,  each  group  in  the  study  had  a 
sample  size  less  than  50.  As  such,  all  BEAM  variables  were  assessed  for  nonnality  using 
Shapiro-Wilk  tests  (208;  238).  For  variables  with  non-significant  Shapiro-Wilk  test 
results,  independent  sample  /-tests  were  used  to  test  the  null  hypothesis  that  there  were  no 
significant  differences  between  BR  and  UR  groups.  Between-group  effect  sizes  (Cohen’s 
d)  were  determined  for  all  comparisons  between  normally  distributed  variables.  For 
variables  whose  Shapiro-Wilk  tests  revealed  significantly  non-no nnal  data  distributions, 
non-parametric  Mann- Whitney  U  tests  were  conducted  to  test  the  null  hypothesis  that 
there  were  significant  differences  between  BR  and  UR  groups.  Between-group  effect 
sizes  were  calculated  for  non-nonnally  distributed  variables  using  the  r  statistic  (i.e.,  the 
Z  score  obtained  from  Mann-Whitney  U  test  divided  by  the  square  root  of  the  total 
sample  size). 

Specific  Aim  2 

To  support  Specific  Aim  2  of  this  dissertation  project,  embedded  and  freestanding 
RVT  variables  were  analyzed  in  manner  described  above.  Of  the  eighteen  embedded 
RVT  variables  submitted  to  ROC  analyses,  thirteen  met  the  initial  criteria  of  AUC  >  0.7 
and  p  <  .05.  All  fifteen  freestanding  RVT  variables  submitted  to  ROC  analyses  met  the 
initial  criteria  of  AUC  >  0.7  and  p  <  .05.  Accordingly,  only  embedded  and  freestanding 
RVT  variables  that  demonstrated  AUC  >  0.9  were  submitted  to  tests  of  nonnality, 
homogeneity  of  variance,  and  group  differences.  Effect  sizes  were  calculated  for  all 
embedded  and  freestanding  RVT  variables  with  significant  differences  between  BR  and 
UR  groups. 
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Next,  all  BEAM,  embedded  RVT,  and  freestanding  RVT  variables  with  AUC  > 
0.9  were  submitted  to  a  series  of  stepwise  logistic  regressions  in  order  to  detennine  the 
variable  from  each  test  type  that  best  predicted  simulator  study  group  membership. 
Forward  and  backward  stepwise  logistic  regressions  were  used  to  efficiently  reduce  the 
number  of  variables  for  each  test  type.  If  a  model’s  chi  square  value  was  significant  at 
the  p  <  .05  level,  variables  with  significant  (p  <  .05)  Wald  chi  square  statistics  were 
retained  for  subsequent  stepwise  logistic  regressions.  Once  a  “representative”  variable 
was  identified  for  each  test  type  (i.e.,  BEAM,  embedded  RVT,  freestanding  RVT),  a 
series  of  hierarchical  logistic  regression  models  were  perfonned  to  assess  the  BEAM’S 
incremental  predictive  value  above  and  beyond  existing  response  validity  measures. 
Unlike  stepwise  logistic  regression  models,  which  treat  variables  equally,  hierarchical 
logistic  regressions  assume  the  variables  are  loaded  in  order  of  predictive  value. 
Accordingly,  three  analyses  were  performed:  1)  BEAM  above  and  beyond  Embedded 
RVTs;  2)  BEAM  above  and  beyond  Freestanding  RVTs;  and  3)  BEAM  above  and 
beyond  both  Embedded  and  Freestanding  RVTs. 

For  the  first  hierarchical  logistic  regression  model,  the  representative  Embedded 
RVT  variable  was  loaded  into  block  1  (“Embedded”)  and  compared  between  BR  and  UR 
groups.  Chi  square  values,  correct  classification  percentages,  and  exponent  B  values 
were  calculated.  In  the  next  step,  the  representative  BEAM  variable  was  loaded  into 
block  2  (“BEAM”).  As  before,  chi  square  values,  correct  classification  percentages,  and 
exponent  B  values  were  calculated.  Differences  in  chi  square  values  between  block  1 
(i.e.,  Embedded)  and  block  2  (i.e.,  BEAM)  were  also  be  calculated  to  determine 
incremental  predictive  value  of  BEAM  metrics  above  and  beyond  the  classification 
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accuracy  of  embedded  RVTs  used  in  this  study.  A  second  hierarchical  logistic  regression 
model  comparing  the  representative  Freestanding  RVT  variable  in  block  1  to  the 
representative  BEAM  variable  in  block  2  was  also  performed. 

A  third  and  final  model  was  used  to  determine  if  the  representative  BEAM 
variable  provided  incremental  predictability  of  invalid  responding  above  and  beyond  the 
representative  variables  from  both  embedded  and  freestanding  RVT.  The  order  of  blocks 
was  as  follows:  1)  Embedded  RVT  variable;  2)  Freestanding  RVT  variable;  and  3) 

BEAM  variable.  Changes  in  chi  square  values  on  step  2  indicated  whether  or  not  the 
representative  freestanding  RVT  variable  provided  incremental  predictive  value  of 
invalid  responding  above  and  beyond  the  representative  embedded  RVT  variable. 
Changes  in  chi  square  values  on  step  3  indicated  whether  or  not  the  representative  BEAM 
variable  provided  significant  predictive  value  above  and  beyond  the  representative 
variables  from  both  embedded  and  freestanding  RVTs. 

Specific  Aim  3 

To  support  Specific  Aim  3  of  this  dissertation  project,  BEAM  and  embedded 
RVT  variables  that  demonstrated  outstanding  classification  accuracy  in  the  simulator 
study  were  compared  to  archival  data  of  unbiased  responders  with  a  history  of  mild 
traumatic  brain  injury  (UR-mTBI).  First,  the  mild  TBI  cohort  from  this  project’s  parent 
study  was  screened  for  meeting  exclusion  criteria.  Next,  chi-square  analyses  and 
ANOVAs  were  used  to  compare  the  UR  and  BR  groups  to  the  UR-mTBI  group  on 
factors  of  age,  gender,  years  of  education,  estimated  premorbid  intelligence,  and 
knowledge  of  head  injuries  to  identify  any  variables  that  needed  to  be  controlled  during 
subsequent  analyses. 
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Shapiro-Wilk  and  Levene’s  tests  were  performed  to  identify  variables  that 
violated  assumptions  of  normal  distributions  and  homogeneity  of  variance,  respectively, 
in  the  UR-mTBI  group.  One-way  ANOVA  and  Kruskal-Wallis  tests  were  then 
performed  to  identify  omnibus  group  differences  among  the  BR,  UR,  and  UR-mTBI 
groups.  Post-hoc  Tukey  HSD  and  Mann- Whitney  U  tests  were  performed  to  identify 
group  differences  and  effect  sizes  between  the  three  groups.  Sensitivity  to  invalid 
responding  (i.e.,  the  BR  group)  was  calculated  for  UR-mTBI  cutoff  scores  that 
demonstrated  85%  specificity  or  higher.  Given  the  absence  of  an  “invalid  responding” 
clinical  comparison  group,  predictive  value  statistics  (e.g.,  PPV,  NPV)  were  not 
calculated  in  this  step. 

Alpha  Level 

A  two-tailed  alpha  level  of  .05  was  used  for  all  ROC  analyses.  To  mitigate  the 
likelihood  of  making  a  Type  1  error  in  subsequent  between-groups  analyses,  only 
variables  with  AUC  >  0.7  were  expected  to  be  considered  for  subsequent  analyses. 
However,  the  majority  of  the  BEAM,  embedded  RVT,  and  freestanding  RVT  variables 
met  this  criteria,  and  a  new  cutoff  of  AUC  >  0.9  was  implemented  to  reduce  Type  I  error. 
Due  to  the  exploratory  nature  of  the  study,  correction  factors  were  not  planned  at  this 
time  of  this  dissertation  project’s  proposal.  However,  the  relatively  high  number  of 
variables  that  met  the  AUC  >  0.9standard  prompted  this  author  to  utilize  Bonferroni 
corrections  for  multiple  groups.  As  such,  all  between-groups  analyses  for  Specific  Aims 
1  and  2  (BR  and  UR  groups)  used  a  Bonferroni-corrected  .002  level  of  significance,  and 
between-groups  analyses  for  Specific  Aim  3  (BR,  UR,  and  UR-mTBI  groups)  used  a 
Bonferroni-corrected  .003  level  of  significance. 
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Power  Analysis 

Multiple  meta-analyses  of  embedded  and  freestanding  response  validity  tests  have 
found  Cohen’s  d  effect  sizes  greater  than  1.0 — a  very  large  group  difference — between 
two  groups  of  invalid  and  valid  responders  (137;  248;  272).  Armistead-Jehle  and  Buican 
(9)  reported  large  effect  sizes  across  neurocognitive  abilities  as  a  function  of  RVT 
performance,  especially  in  tests  of  attention,  processing  speed,  and  memory.  These  large 
effect  sizes  influence  the  sample  size  required  to  sufficiently  power  this  dissertation 
project’s  analyses. 

Power  analyses  were  computed  using  G*Power  Version  3  (75).  At  the  time  of 
this  project’s  proposal,  it  was  estimated  that  at  least  20  subjects  per  group  (BR,  UR,  UR- 
mTBI)  would  be  required  to  obtain  80%  power.  Data  collection  for  this  project  was 
conducted  from  June  2013  until  December  2013.  All  efforts  were  made  to  meet  and 
exceed  the  planned  sample  sizes  for  the  three  groups.  The  simulator  study’s  final  sample 
size  of  50  (n  =  24  in  BR  group;  n  =  26  in  UR  group)  was  found  to  sufficiently  power 
parametric  (i.e.,  independent  samples  /-tests)  and  non-parametric  (i.e.,  Mann-Whitney  U 
tests)  between-groups  analyses  in  Specific  Aims  1  and  2  (75).  The  final  UR-mTBI 
sample  size  (n  =  19),  despite  being  one  subject  below  the  planned  sample  size,  was  found 
to  sufficiently  to  power  parametric  (i.e.,  one-way  ANOVAs)  and  non-parametric  (i.e., 
Kruskal-Wallis  tests)  between-groups  analyses  in  Specific  Aim  3  (75). 
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CHAPTER  3:  Results 


Fifty-seven  people  contacted  study  coordinators  to  participate  in  the  prospective, 
experimental  study  of  the  dissertation  project.  Of  these,  one  was  ineligible  due  to  history 
of  concussion  and  five  could  not  participate  due  to  scheduling  conflicts.  Of  the  fifty-one 
participants  who  attempted  to  complete  the  experiment,  one  did  not  complete  the  protocol 
due  to  technical  difficulties  with  the  lab  equipment.  As  a  result,  fifty  participants 
completed  the  experiment. 

All  study  participants  were  randomly  assigned  to  either  the  Biased  Responder 
(BR)  group  or  the  Unbiased  Responder  (UR)  group.  Demographic  information  for  these 
groups  is  shown  in  Table  3.  The  BR  group  (n  =  24)  and  UR  group  (n  =  26)  did  not  differ 
significantly  on  age,  years  of  fonnal  education,  estimated  premorbid  intelligence,  gender, 
or  race/ethnicity.  Additionally,  the  BR  group  did  not  significantly  differ  from  the  UR 
group  on  knowledge  of  head  injury  sequelae  prior  to  group  assignment. 

Specific  Aim  1 

Examining  relationships  between  invalid  responding  and  performance  on  BEAM  metrics 
(Hypotheses  1A-1E) 

ROC  analyses  were  conducted  to  identify  the  accuracy  of  each  BEAM  variable  in 
predicting  membership  in  the  BR  or  UR  groups.  Twenty-nine  BEAM  variables  were 
submitted  to  ROC  analyses:  one  from  the  number  of  trials  with  invalid  initial  fixations 
(Hypothesis  1  A),  twelve  from  saccadic  reaction  time  (SacRT)  and  manual  reaction  time 
(ManRT)  for  the  overall  measure  and  five  trial  types  (Hypothesis  IB),  twelve  from 
saccadic  and  manual  RT  intra-individual  variability  (IIV)  for  the  overall  measure  and  five 
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20 

trial  types  (Hypothesis  1C),  two  from  saccadic  and  manual  commission  error 
percentage  (Com%)  from  the  inhibition  trial  type  (Hypothesis  ID),  and  two  from  the 
saccadic  and  manual  omission  error  percentage  (Om%)  from  the  non-inhibition  trial 
types  (Hypothesis  IE). 

As  stated  earlier  in  the  manuscript,  at  least  10  successfully  recorded  trials  per  trial 
type  were  required  to  generate  BEAM  summary  metrics.  Several  participants  in  the 
biased  responding  group  perfonned  in  such  a  manner  where  the  BEAM  could  not 
calculate  summary  reaction  time  metrics  (e.g.,  too  many  omissions,  too  many  trials  with 
data  loss).  As  a  result,  a  small  amount  of  BEAM  data  (1.9%  of  all  BEAM  data)  was 
missing,  exclusively  from  the  BR  group.  The  results  from  the  ROC  analyses  are  shown 
in  Table  4. 

Twenty-six  BEAM  variables  demonstrated  statistically  significant  classification 
accuracy  ip  <  .05);  twenty-five  BEAM  variables  demonstrated  acceptable  classification 
accuracy  (area  under  the  curve  [AUC]  >  0.70).  The  Number  of  Invalid  Fixations  variable 
was  statistically  significant  (p  =  .02)  but  demonstrated  less-than-acceptable  classification 
accuracy  (AUC  =  0.69;  95%  CI=  0.53  to  0.84).  As  such,  Hypothesis  1A  was  not 
confirmed.  SacRT-UCG  (p  =  .19;  AUC  =  0.61;  95%  C/=  0.44  to  0.78),  SacRT-UC  (p  = 
.25;  AUC  =  0.60;  95%  C/=  0.42  to  0.77),  and  SacOm%  ip  =  .25;  AUC  =  0.60;  95%  C/  = 
0.44  to  0.75)  failed  to  reach  statistical  significance  and  demonstrated  classification 
accuracy  below  the  acceptable  range. 


20  The  five  trial  types  with  RT  and  RT-IIV  (i.e.,  the  non-inhibition  trial  types)  are  Directional  Cue 
(DC),  Nondirectional  Cue  (NDC),  Misdirectional  Cue  (MDC),  Uncued  with  Gap  (UCG),  and  Uncued 
(UC). 

21  The  inhibition  trial  type  is  Directional  Cue-Red  Arrow  (DCR) 
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ROC  analyses  indicated  that  ten  of  the  twelve  BEAM  RT  variables  achieved 
acceptable-to-excellent  classification  accuracy  (AUC  range:  0.71-0.89).  As  such, 
Hypothesis  IB  was  partially  confirmed.  Hypothesis  1C  was  confirmed,  as  all  twelve 
BEAM  RT-IIV  variables  demonstrated  excellent-to-outstanding  classification  accuracy 
(AUC  range:  0.85-0.97).  Hypothesis  ID  was  confirmed,  as  both  SacCom%  (AUC  = 

0.94;  95%  CI=  0.87  to  1.00)  and  ManCom%  (AUC  =  0.80;  95%  Cl  =0.66  to  0.93) 
variables  demonstrated  excellent-to-outstanding  classification  accuracy.  Lastly, 
Hypothesis  IE  was  partially  confirmed,  as  ManOm%  (AUC  =  0.94;  95%  CI=  0.87  to 
1 .00)  demonstrated  outstanding  classification  accuracy  while  SacOm%  demonstrated 
unacceptable  classification  accuracy. 

Of  the  ten  BEAM  RT  variables  with  AUC  >  0.7,  four  were  saccadic  RT  and  six 
were  manual  RT.  SacRT-MDC  (AUC  =  0.81;  95%  CI=  0.69  to  0.94)  demonstrated 
excellent  classification  accuracy,  while  SacRT-DC  (AUC  =  0.77;  95%  C/=  0.65  to  0.90), 
SacRT-Overall  (AUC  =  0.74;  95%  Cl  =0.60  to  0.88),  and  SacRT-NDC  (AUC  =  0.71; 
95%  Cl  =  0.57  to  0.86)  demonstrated  acceptable  classification  accuracy.  All  manual  RT 
variables  demonstrated  excellent  classification  accuracy:  ManRT-Overall  (AUC  =  0.89; 
95%  C/=  0.80  to  0.99),  ManRT-UCG  (AUC  =  0.89;  95%  CI=  0.78  to  0.99),  ManRT- 
NDC  (AUC  =  0.88;  95%  CI=  0.78  to  0.98),  ManRT-MDC  (AUC  =  0.86;  95%  C/=  0.74 
to  0.99),  ManRT-DC  (AUC  =  0.84;  95%  C/=  0.73  to  0.96),  and  ManRT-UC  (AUC  = 
0.83;  95%  CI=  0.70  to  0.95). 

All  twelve  (six  saccadic  and  six  manual)  BEAM  RT-IIV  variables  achieved 
excellent-to-outstanding  levels  of  classification  accuracy.  SacRT-IIV-Overall  (AUC  = 
0.97;  95%  C/=  0.93  to  1.00),  ManRT-IIV-Overall  (AUC  =  0.97;  95%  CI=  0.92  to  1.00), 
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ManRT-IIV-DC  (AUC  =  0.97;  95%  Cl  =0.92  to  1.00),  ManRT-IIV-NDC  (AUC  =  0.96; 
95%  CI  =  0.90  to  1.00),  ManRT-IIV-MDC  (AUC  =  0.96;  95%  Cl  =0.90  to  1.00), 
SacRT-IIV-DC  (AUC  =  0.93;  95%  CI=  0.86  to  1.00),  ManRT-IIV-UCG  (AUC  =  0.93; 
95%  CI=  0.84  to  1.00),  SacRT-IIV -MDC  (AUC  =  0.92;  95%  CI=  0.83  to  1.00), 
ManRT-IIV-UC  (AUC  =  0.92;  95%  CI=  0.84  to  0.99),  and  SacRT-IIV-UCG  (AUC  = 
0.90;  95%  CI=  0.80  to  1.00)  demonstrated  outstanding  classification  accuracy.  SacRT- 
IIV-UC  (AUC  =  0.85;  95%  CI=  0.75  to  0.96),  and  SacRT-IIV-NDC  (AUC  =  0.85;  95% 
CI=  0.74  to  0.96)  demonstrated  excellent  classification  accuracy. 

To  reduce  the  number  of  variables  submitted  to  further  analyses,  only  variables 
that  demonstrated  outstanding  classification  accuracy  (AUC  >  0.9)  were  considered. 
Using  this  criterion,  twelve  BEAM  variables  were  retained:  SacRT-IIV-Overall,  ManRT- 
HV-Overall,  ManRT-IIV-DC,  ManRT-IIV-NDC,  ManRT-IIV-MDC,  ManOm%, 
SacCom%,  SacRT-IIV-DC,  ManRT-IIV-UCG,  SacRT-IIV-MDC,  ManRT-IIV-UC,  and 
SacRT-IIV-UCG.  Notably,  ten  of  the  twelve  outstanding  BEAM  variables  were  reaction 
time  intra-individual  variability.  The  other  two  BEAM  variables  with  outstanding 
classification  accuracy  were  SacCom%  and  ManOm%. 

Shapiro-Wilk  tests  of  nonnality  revealed  significantly  non-normal  distributions 
for  four  BEAM  variables:  ManRT-IIV-UCG  (BR  group:  W( 21)  =  .87,  p  =  .01),  ManRT- 
IIV-UC  (BR  group:  If (23)  =  .90,  p  =  .03),  SacCom%  (UR  group:  W( 26)  =  .81,/?  <  .001), 
and  ManOm%  (UR  group:  W( 26)  =  .28 ,p  <  .001;  BR  group:  W( 24)  =  .77,/?  <  .001).  As 
shown  in  Table  5,  Mann- Whitney  U  tests  with  a  Bonferroni  correction  of  .002  level  of 
significance  indicated  biased  responders  perfonned  significantly  worse  (i.e.,  higher  error 
rate,  greater  RT  variability)  than  unbiased  responders  on  the  following  variables: 
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ManRT -II V-UC G  (U=  41.0,/?  <  .001,  r  =  .72),  ManRT-IIV-UC  (U=  51.0, p  <  .001,  r  = 
.71),  ManOm%  (U=  36.5,  p  <  .001,  r  =  .78),  and  SacCom%  (U=  39.0,  p  <  .001,  r  =  .75). 
Very  large  effect  sizes  were  found  for  all  between-group  comparisons  (r  range:  .71  -  .78). 

Shapiro-Wilk  tests  for  the  other  eight  BEAM  variables  with  outstanding 
classification  accuracy  were  not  significant  (p  >  .05)  in  either  BR  or  UR  group. 
Independent  samples  t-tests  were  conducted  on  the  eight  BEAM  variables  with  normal 
distributions.  A  Bonferroni  correction  was  applied  so  all  effects  are  reported  at  a  .002 
level  of  significance.  As  shown  in  Table  6,  the  t-tests  indicated  that  reaction  time  intra¬ 
individual  variability  was  significantly  greater  in  the  BR  group  than  the  UR  group  for  the 
following  variables:  SacRT-IIV-DC,  £(48)  =  7.72;  SacRT-IIV-Overall,  £(48)  =  9.49; 
ManRT-IIV-DC,  £(47)  =  10.2;  ManRT-IIV-NDC,  £(47)  =  8.99;  ManRT-IIV-MDC,  £(42) 

=  7.49;  ManRT-IIV -Overall,  £(47)  =  10.2;  SacRT-IIV-MDC,  £(32)  =  7.21;  and  SacRT- 
IIV-UCG,  £(33)  =  6.86.  Levene’s  test  indicated  unequal  variances  for  SacRT-IIV-MDC 
(F  =  7.64,  p  =  .008)  and  SacRT-IIV-UCG  (F  =  4.19,  p  =  .046),  so  degrees  of  freedom 
were  adjusted  from  48  to  32  and  47  to  33,  respectively.  Very  large  effect  sizes  were 
found  for  all  between-group  comparisons  (Cohen’s  d  range:  1.99  -  2.90).  Classification 
accuracy  statistics  for  the  twelve  best  BEAM  variables  are  shown  in  Table  7. 

A  series  of  stepwise  logistic  regressions  were  conducted  to  identify  the  best 
predictors  of  invalid  responding  among  the  twelve  BEAM  variables.  To  keep  from 
overfitting  the  regression  models,  the  twelve  BEAM  variables  were  divided  into  five 
groups  of  similar  variables: 

1)  SacCom%  and  ManOm%; 

2)  SacRT-IIV-DC,  SacRT-IIV-MDC,  and  SacRT-IIV-UCG, 
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3)  ManRT -II V -DC ,  ManRT -IIV -NDC ,  and  ManRT -IIV -MDC ; 

4)  ManRT -IIV -UCG  and  ManRT-IIV-UC;  and 

5)  SacRT-IIV-Overall  and  ManRT-IIV-Overall. 

All  intra-individual  variability  variables  were  converted  to  milliseconds  in  order  to  obtain 
interpretable  odds  ratios.  Significant  predictor  variables  from  each  group  were  identified 
and  loaded  into  subsequent  stepwise  regressions  until  the  best  predictors  of  the  biased 
responder  group  were  identified  (see  Figure  1). 

Analyses  indicated  that  SacRT-IIV-Overall  and  ManRT-IIV-Overall  as  a  set  most 
reliably  differentiated  the  BR  and  UR  groups,  y2(2,  N  =  49)  =  53.6,/?  <  .001. 

Nagelkerke’s  R  of  .89  indicated  a  strong  relationship  between  prediction  and  grouping. 
Prediction  success  overall  was  95.9%  (96.2%  for  UR  group  and  95.7%  for  BR  group). 
Both  SacRT-IIV-Overall  (y2  (1)  =  4.80,/?  =  .03)  and  ManRT-IIV-Overall  (y2  (1)  =  5.36,/? 
=  .02)  significantly  contributed  to  group  prediction.  Exp(B)  values  indicated  that  when 
SacRT-IIV-Overall  and  ManRT-IIV-Overall  increases  by  one  millisecond,  a  person  is 
1.09  and  1.08  times  more  likely  to  be  an  invalid  responder.  Put  a  different  way,  every  10 
millisecond  increase  in  the  standard  deviation  of  an  individual’s  average  overall  saccadic 
or  manual  reaction  time  increases  the  chances  of  a  person  being  an  invalid  responder  by 
more  than  2  times.  Additional  ROC  analyses  indicated  that  the  combined  AUC  of  SacRT- 
IIV-Overall  +  ManRT-IIV-Overall  was  nearly  perfect  (AUC  =  0.99;  95%  CI=  0.96  to 
1.00).  SacRT-IIV-Overall  and  ManRT-IIV-Overall  each  contributed  an  additional  0.02 
AUC  to  the  model  above  and  beyond  their  individual  classification  accuracy. 
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Specific  Aim  2 


Comparing  the  invalid  responding  classification  accuracy  abilities  of  BEAM  metrics  to 
existing  response  validity  tests  (Hypotheses  2A-2C). 

Embedded  RVTs 

Eighteen  variables  from  embedded  response  validity  tests  (RVTs)  were  submitted 
to  ROC  analyses.  As  seen  in  Table  8,  thirteen  of  these  variables  demonstrated  acceptable- 
or-greater  classification  accuracy.  TMT  A  time  (AUC  =  0.74;  95%  CI=  0.59  to  0.90 ,p  = 
.003)  and  B  time  (AUC  =  0.79;  95%  CI=  0.66  to  0.91,  p  =  .001)  demonstrated  acceptable 
classification  accuracy.  Seven  CPT-II  variables  demonstrated  significant  (p  <  .001) 
classification  accuracy  that  ranged  from  acceptable  to  outstanding:  RT  ISI  Change  (AUC 
=  0.79;  95%  CI=  0.67  -  0.91),  Perseverations  (AUC  =  0.84;  95%  Cl  =0.73  to  0.96), 
Detectability  (AUC  =  0.88;  95%  C/=  0.77  to  0.98),  Variability  (AUC  =  0.89;  95%  CI  = 
0.80  to  0.98),  Hit  RT  Standard  Error  (AUC  =  0.90;  95%  CI=  0.82  to  0.99),  Omissions 
(AUC  =  0.91;  95%  C/=  0.82  to  1.00),  and  Commissions  (AUC  =  0.93;  95%  CI=  0.85  to 
1.00).  All  four  variables  from  the  WAIS-IV  Digit  Span  subtest,  Age-Corrected  Scaled 
Score  (ACSS;  AUC  =  0.94;  95%  CI=  0.88  to  1.00),  Reliable  Digit  Span-Revised  (RDS- 
R;  AUC  =  0.94;  95%  C/=  0.88  to  1.00),  Alternative  Reliable  Digit  Span  (ARDS;  AUC  = 
0.94;  95%  Cl  =0.86  to  1.00),  and  Reliable  Digit  Span  (RDS;  AUC  =  0.93;  95%  CI  = 

0.84  to  1.00),  demonstrated  outstanding  and  significant  (p  <  .001)  classification  accuracy. 
CPT-II  Standard  Error  Interstimulus  Interval  Change  (AUC  =  0.67;  95%  CI=  0.52  to 
0.82,  p  =  .04)  was  significant  but  demonstrated  below-acceptable  classification  accuracy. 
CPT-II  Hit  RT  (AUC  =  0.52;  95%  CI=  0.35  to  0.69,  p  =  .83),  Hit  RT  Block  Change 
(AUC  =  0.50;  95%  CI=  0.33  to  0.66,  p  =  .96),  Hit  RT  Standard  Error  Block  Change 
(AUC  =  0.54;  95%  C/=  0.37  to  0.10,  p  =  .67),  and  Response  Style  (AUC  =  0.67;  95%  Cl 
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=  0.51  to  0.82,/)  =  .05)  did  not  significantly  differentiate  between  biased  and  unbiased 
responders. 

As  with  the  BEAM  variables  in  Aim  1 ,  only  embedded  RVT  variables  with 
outstanding  (AUC  >  0.9)  classification  accuracy  were  submitted  to  further  analyses.  Of 
these  seven  variables,  four  were  from  the  WAIS-IV  Digit  Span:  Reliable  Digit  Span- 
Revised  (RDS-R),  Alternative  Reliable  Digit  Span  (ARDS),  Age-Corrected  Scaled  Score 
(ACSS),  and  Reliable  Digit  Span  (RDS).  Three  variables  from  the  CPT-II — Omissions, 
Commissions,  and  Hit  RT  Standard  Error  (SE) — were  also  included  in  subsequent 
analyses. 

Shapiro-Wilk  tests  indicated  that  RDS-R  (UR  group:  JV(26)  =  .90 ,/>  =  .015), 
ARDS  (UR  group:  W( 26)  =  .92,  p  =  .040),  CPT-II  Omissions  (UR  group:  W( 26)  =  .64 ,p 
<  .001;  BR  group:  W( 24)  =  .66,  p  <  .001),  and  CPT-II  Hit  RT  SE  (BR  group:  fV(24)  = 

.80.  p  <  .001)  were  significantly  non-nonnal.  As  shown  in  Table  9,  Mann- Whitney  U 
tests  with  a  Bonferroni  correction  of  .002  level  of  significance  indicated  that  RDS-R 
scores  in  the  BR  group  ( Mdn  =  11.0)  were  significantly  lower  than  the  UR  group  (Mdn  = 
17.0),  U=  35.0  ,p<  .001,  r  =  .77 ,  and  that  ARDS  scores  in  the  BR  group  (Mdn  =  8.50) 
were  significantly  lower  than  the  UR  group  (Mdn  =  12.0),  U=  38.0.  p  <  .001,  r  =  .76. 
Additional  Mann- Whitney  U  tests  indicated  that  the  BR  group  (Mdn  =  6.50)  made 
significantly  more  Omission  errors  on  the  CPT-II  than  the  UR  group  (Mdn  =  0.00),  U  = 
57.5,  p  <  .001,  r  =  .72,  and  that  Hit  RT  SE  was  significantly  higher  in  the  BR  group  (Mdn 
=  7.56)  than  the  UR  group  (Mdn  =  4.20),  U=  61.0,/)  <  .001,  r  =  .69. 

Shapiro-Wilk  tests  for  ACSS,  RDS,  and  CPT-II  Commissions  were  not  significant 
(p  >  .05)  in  either  the  BR  or  UR  group.  Independent  samples  t-tests  were  conducted  on 
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the  embedded  RVT  variables  with  normal  distributions.  A  Bonferroni  correction  was 
applied  so  all  effects  are  reported  at  a  .002  level  of  significance.  As  shown  in  Table  10, 
independent  samples  t-tests  on  WAIS-IV  Digit  Span  variables  indicated  that  the  BR 
group  had  significantly  lower  ACSS,  t(48)  =  7.83, p  <  .001,  and  RDS,  t(48)  =  6.91  ,p< 
.001,  scores  than  the  UR  group.  Additionally,  the  BR  group  made  significantly  more 
Commission  errors,  t(48)  =  7 .72,  p  <  .001,  on  the  CPT-II  than  the  UR  group.  Very  large 
effect  sizes  were  found  for  all  between-group  comparisons  among  the  normally 
distributed  embedded  RVT  variables  (Cohen’s  d  range:  1.94-2.23).  Classification 
accuracy  statistics  for  the  seven  best  embedded  RVT  variables  are  shown  in  Table  11. 

A  series  of  stepwise  logistic  regressions  were  conducted  to  identify  the  best 
predictors  of  invalid  responding  among  the  seven  embedded  RVT  variables.  To  keep 
from  overfitting  the  models,  the  seven  embedded  RVT  variables  were  broken  into  two 
groups  of  similar  variables,  one  from  the  WAIS-IV  Digit  Span  subtest  (ACSS,  RDS, 
RDS-R,  and  ARDS),  and  one  from  the  CPT-II  (Omissions,  Commissions,  and  Hit  RT 
SE).  Significant  predictor  variables  from  each  group  were  identified  and  loaded  into 
subsequent  stepwise  regressions  until  the  best  predictors  of  the  biased  responder  group 
were  identified. 

Analyses  indicated  that  RDS-R  and  CPT-II  Commissions  as  a  set  most  reliably 
differentiated  the  BR  and  UR  groups,  'i  (2,  N=  50)  =  49.6,  p  <  .001.  Nagelkerke’s  R2  of 
.84  indicated  a  strong  relationship  between  prediction  and  grouping.  Prediction  success 
overall  was  94.0%  (96.2%  for  UR  group  and  91.7%  for  BR  group).  CPT-II  Commissions 
(X2  (1)  =  5.95, p  =  .02)  and  RDS-R  (x2  (1)  =  7.11, p  =  .008)  both  made  significant 
contributions  to  group  prediction.  Exp(B)  values  indicated  that  each  additional  CPT-II 
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Commission  error  increases  the  likelihood  of  a  person  being  an  invalid  responder  by  1.33 
times.  Since  higher  RDS-R  scores  are  consistent  with  better  perfonnance,  each  fewer 
RDS-R  point  increases  the  likelihood  of  a  person  being  an  invalid  responder  by  2.27 
times.  Additional  ROC  analyses  indicated  that  the  combined  AUC  of  RDS-R  +  CPT-II 
Commissions  was  0.97  (95%  CI=  0.93  to  1.00).  RDS-R  contributed  an  additional  0.046 
to  the  overall  classification  accuracy  above  and  beyond  CPT-II  Commissions,  and  CPT-II 
Commissions  contributed  an  additional  0.027  to  the  classification  accuracy  above  and 
beyond  RDS-R. 

Freestanding  RVTs 

Fifteen  variables  from  freestanding  RVTs  were  submitted  to  ROC  analyses.  As 
seen  in  Table  12,  all  fifteen  variables  demonstrated  significant  (p  <  .001)  classification 
accuracy  that  ranged  from  excellent-to-outstanding.  The  VSVT’s  Total  RT22  SD  (AUC  = 
0.87;  95%  Cl  =0.77  to  0.98),  Easy  RT  SD  (AUC  =  0.85;  95%  CI=  0.74  to  0.96),  and 
Easy  Correct  (AUC  =  0.85;  95%  C/=  0.73  to  0.96)  variables  demonstrated  excellent 
classification  accuracy.  MSVT  Immediate  Recognition  %  Correct  (IR;  AUC  =  0.90;  95% 
CI=  0.80  to  1.00),  MSVT  Delayed  Recognition  %  Correct  (DR;  AUC  =  0.92;  95%  CI  = 
0.84  to  1.00),  MSVT  Consistency  %  Correct  (CNS;  AUC  =  0.94;  95%  CI=  0.87  to  1.00), 
MSVT  Paired  Associates  %  Correct  (PA;  AUC  =  0.93;  95%  CI=  0.85  to  1.00),  MSVT 
Free  Recall  %  Correct  (FR;  AUC  =  0.96;  95%  CI=  0.91  to  1.00),  MSVT  Fail  Any 
Subtest  (AUC  =  0.90;  95%  C/=  0.80  to  1.00),  VSVT  Easy  RT  (AUC  =  0.90;  95%  C/  = 
0.81  to  0.98),  VSVT  Difficult  RT  (AUC  =  0.91;  95%  CI=  0.82  to  0.99),  VSVT  Difficult 
RT  SD  (AUC  =  0.90;  95%  CI=  0.81  to  0.99),  and  VSVT  Total  RT  (AUC  =  0.93;  95%  Cl 

22  Response  Latency  will  be  abbreviated  as  “RT”  for  consistency  with  other  variables  in  this  study 
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=  0.86  to  1.00)  demonstrated  outstanding  classification  accuracy.  It  should  be  noted  that 
MSVT  Fail  Any  Subtest  is  a  dichotemous  variable,  unlike  the  fourteen  other  freestanding 
RVT  variables.  Two  VSVT  variables,  Difficult  Correct  and  Total  Correct,  were  perfect 
(AUC  =  1.00;  95%  CI=  1.00  to  1.00)  in  their  ability  to  identify  invalid  responding  in  this 
study’s  experimental  sample. 

Six  of  nine  VSVT  variables  (Difficult  Correct,  Total  Correct,  Total  RT,  Difficult 
RT,  Difficult  RT  SD,  and  Easy  RT)  and  all  six  MSVT  variables  (Fail  Any  Subtest,  IR, 
DR,  CNS,  PA,  and  FR)  demonstrated  outstanding  classification  accuracy  and  were 
submitted  to  additional  analyses.  Shapiro-Wilk  tests  of  nonnality  revealed  significantly 
non-normal  distributions  for  all  eleven  continuous,  freestanding  RVT  variables:  VSVT 
Difficult  Correct  (UR  group:  W( 26)  =  .68,  p  <  .001),  VSVT  Total  Correct  (UR  group: 

W( 26)  =  .l\,p<  .001),  VSVT  Easy  RT  (UR  group:  W( 26)  =  .75,  p  <  .001;  BR  group: 

W( 24)  =  .87,  p  =  .004),  VSVT  Difficult  RT  (UR  group:  W( 26)  =  .92,  p  =  .04;  BR  group: 
W( 24)  =  .78,  p  <  .001),  VSVT  Total  RT  (UR  group:  W( 26)  =  .87,  p  =  .004;  BR  group: 
W(24)  =  .82,  p  =  .001),  VSVT  Difficult  RT  SD  (UR  group:  W( 26)  =  .82,  p  <  .001;  BR 
group:  W(24)  =  .79, p  <  .001),  MSVT  IR  (BR  group:  W( 24)  =  .89 ,p  =  .01),  MSVT  DR 
(UR  group:  W( 26)  =  .38 ,p<  .001),  MSVT  CNS  (UR  group:  IV(26 )  =  .38, p  <  .001), 
MSVT  PA  (UR  group:  JF(26)  =  .20,  p  <  .001),  and  MSVT  FR  (UR  group:  JF(26)  =  .85,  p 
=  .002). 

As  shown  in  Table  13,  Mann-Whitney  U  tests  with  a  Bonferroni  correction  of 
.002  level  of  significance  indicated  statistically  significant  differences  between  the  UR 
and  BR  groups  on  VSVT  Difficult  Correct  ( U=  0.00,  p  <  .001,  r  =  .87),  VSVT  Total 
Correct  (U=  0.00, p  <  .001,  r  =  .87),  VSVT  Easy  RT  (U=  63.0, p  <  .001,  r  =  .68),  VSVT 
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Difficult  RT  (U  =  59.0,  p  <  .001,  r  =  .69),  VSVT  Difficult  RT  SD  (17=  64.0,  p  <  .001,  r  = 
.68),  VSVT  Total  RT  (17=  44.5,/?  <  .001,  r  =  .73),  MSVT  IR  (17=  65.0,/?  <  .001,  r  = 

.78),  MSVT  DR  (17=  48.0, p  <  .001,  r  =  .78),  MSVT  CNS  (17=  37.5, p  <  .001,  r  =  .81), 
MSVT  PA  (17=  42.0,  p  <  .001,  r  =  .82),  and  MSVT  FR  (17=  28.0,  p  <  .001,  r  =  .79). 
Additionally,  members  of  the  BR  group  failed  MSVT  at  least  one  subtest  (thus  failing  the 
MSVT)  at  a  significantly  higher  rate  than  the  UR  group,  %2(5,  N=  50)  =  33.2,  p  <  .001,  r 
=  .81.  Classification  accuracy  statistics  for  the  twelve  best  freestanding  RVT  variables 
are  shown  in  Table  14. 

A  series  of  stepwise  logistic  regressions  were  conducted  to  identify  the  best 
predictors  of  invalid  responding  among  the  twelve  freestanding  RVT  variables.  VSVT 
Difficult  Correct  and  VSVT  Total  Correct  were  excluded  from  these  logistic  regressions 
because  they  demonstrated  perfect  classification  accuracy  and  overfit  the  regression 
models.  To  keep  from  overfitting  the  models,  ten  freestanding  RVT  variables  were 
divided  into  four  groups  of  similar  variables: 

1)  VSVT  Easy  RT  and  VSVT  Difficult  RT; 

2)  VSVT  Total  RT  and  VSVT  Difficult  RT  SD; 

3)  MSVT  IR,  MSVT  DR,  and  MSVT  CNS;  and 

4)  MSVT  PA,  MSVT  FR,  and  MSVT  Fail  Any  Subtest. 

Significant  predictor  variables  were  identified  and  loaded  into  subsequent  stepwise 
regressions  until  the  best  predictors  of  the  biased  responder  group  (behind  VSVT 
Difficult  Correct  and  VSVT  Total  Correct)  were  identified. 

Analyses  indicated  that  VSVT  Total  RT  and  MSVT  FR  as  a  set  most  reliably 
differentiated  the  BR  and  UR  groups,  2,  N  =  50)  =  49.9,  p  <  .001.  Nagelkerke’s  Rz  of 
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.84  indicated  a  strong  relationship  between  prediction  and  grouping.  Prediction  success 
overall  was  92.0%  (92.3%  for  UR  group  and  91.7%  for  BR  group).  VSVT  Total  RT  (yj 
(1)  =  4.77,  p  =  .03)  and  MSVT  FR  (yj  (1)  =  6.21,  p  =  .01)  both  made  significant 
contributions  to  group  prediction.  Exp(B)  values  indicated  that  each  additional  second  in 
VSVT  Total  RT  increases  the  likelihood  of  a  person  being  an  invalid  responder  by  more 
than  17  times.  Since  higher  MSVT  Free  Recall  scores  are  consistent  with  better 
perfonnance,  each  fewer  MSVT  FR  percentage  point  increases  the  likelihood  of  a  person 
being  an  invalid  responder  by  1.19  times.  Put  another  way,  each  missed  MSVT  question 
(with  5  percentage  points  per  question)  increases  the  likelihood  of  the  participant  being 
an  invalid  responder  by  6  times.  Additional  ROC  analyses  indicated  that  the  combined 
AUC  of  VSVT  Total  RT  and  MSVT  FR  was  0.99  (95%  C/=  0.96  to  1.00).  VSVT  Total 
RT  contributed  an  additional  0.031  to  the  overall  classification  accuracy  above  and 
beyond  MSVT  FR,  and  MSVT  FR  contributed  an  additional  0.057  to  the  classification 
accuracy  above  and  beyond  VSVT  Total  RT. 

Hierarchical  Logistic  Regressions 

Once  variables  from  the  three  test  types  were  reduced  to  the  best  and  most 
representative  variables  (see  Figure  2),  hierarchical  logistic  regressions  were  conducted 
to  test  incremental  predictive  ability  of  the  BEAM  above  and  beyond  embedded  RVTs 
(Hypothesis  2 A),  freestanding  RVTs  (Hypothesis  2B),  and  both  embedded  and 
freestanding  RVTs  as  a  set  (Hypothesis  2C).  The  variables  that  demonstrated  the  best 
classification  accuracy  from  their  respective  test  type  (i.e.,  BEAM,  embedded, 
freestanding)  were  used  in  these  analyses.  Any  variables  that  individually  demonstrated 
perfect  classification  accuracy  were  excluded  from  these  analyses  because  perfect 
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classification  overfits  regression  models  in  SPSS.  Based  on  concurrent  validity  results 
above,  Reliable  Digit  Span-Revised  (RDS-R)  was  identified  as  the  best,  most 
representative  embedded  RVT  variable,  as  was  MSVT  Free  Recall  %  Correct  (FR)  for 
freestanding  RVT  variables.  Either  Overall  Saccadic  or  Manual  Reaction  Time  Intra- 
Individual  Variability  (SacRT-IIV-Overall;  ManRT-IIV-Overall)  could  be  used  to 
represent  the  BEAM  since  both  variables  demonstrated  nearly  identical  classification 
accuracy  and  additional  predictive  value  over  each  other. 

As  seen  in  Table  15,  several  models  were  used  to  test  incremental  predictive 
validity.  In  the  first  model,  RDS-R  was  entered  in  block  1  and  SacRT-IIV-Overall  was 
entered  in  block  2.  Hypothesis  2 A  was  confirmed  as  the  representative  BEAM  variable 
added  significantly  to  the  model  above  and  beyond  the  representative  embedded  RVT 
variable,  'i(2,  N=  50)  =  58.1,/?  <  .00 1 ,  f?2-change  =  .18.  In  the  second  model,  MSVT  FR 
was  entered  in  block  1  and  SacRT-IIV-Overall  was  entered  in  Block  2.  Parameter 
estimates  could  not  be  obtained  because  the  model  demonstrated  exact  classification 
accuracy  and  was  overfit.  As  such,  a  separate  model  with  ManRT-IIV-Overall  was 
entered  in  block  2.  The  alternate  representative  BEAM  variable  added  significantly  to 
the  model  above  and  beyond  the  representative  freestanding  RVT  variable  ,  %  (2,  N  = 
49)  =  53.3, p  <  .001,  ^“-change  =  .15.  Because  freestanding  RVTs  produced  two 
variables  with  perfect  classification,  and  the  BEAM  cannot  outperform  perfection, 
Hypothesis  2B  was  partially  confirmed. 

A  third  model  was  conducted  to  compare  embedded  to  freestanding  to  BEAM. 
RDS-R  was  entered  into  block  1  and  MSVT  FR  was  entered  into  block  2.  As  expected, 
the  representative  freestanding  RVT  variable  added  significantly  to  the  model  above  and 

23  Without  achieving  exact  classification  accuracy 
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beyond  the  representative  embedded  RVT  variable,  x2( 2,  N  =  50)  =  52.5,  p  <  .001,  R2- 
change  =  .  13.  In  block  2,  prediction  success  overall  was  92.0%  (92.3%  for  UR  group  and 
91.7%  for  BR  group).  When  SacRT-IIV-Overall  was  entered  into  block  3  of  this  final 
model,  the  model  achieved  exact  classification  accuracy  and  was  overfit.  The  same  result 
was  found  when  ManRT-IIV-Overall  was  entered  into  block  3  as  the  representative 
BEAM  variable.  While  both  manual  and  saccadic  representative  BEAM  variables 
improved  the  overall  prediction  success  from  92.0%  to  100%,  the  significance  of  the 
increase  in  predictive  value  above  and  beyond  both  embedded  and  freestanding  RVTs 
could  not  be  tested  in  SPSS  Version  20.  Additionally,  the  BEAM’S  improvements  to  the 
model  were  obtained  while  using  the  third  best  freestanding  RVT  variable  identified  in 
the  simulator  study,  MSVT  FR.  As  such,  Hypothesis  2C  was  partially  confirmed. 

Specific  Aim  3 

To  evaluate  BEAM  and  embedded  RVT  performance  between  simulator  study 
participants  and  valid  responders  with  a  history  of  mild  traumatic  brain  injury 
(Hypotheses  3A-3B). 

At  the  time  of  the  prospective,  experimental  study’s  data  collection  completion, 
data  for  forty- five  subjects  in  the  parent  study’s  TBI  sample  had  been  collected.  Of  the 
thirty  subjects  meeting  criteria  for  “mild  TBI,”  two  were  excluded  for  having  medical 
conditions  the  interfered  with  neurocognitive  testing  (i.e.,  optic  nerve  tumor,  severe 
diplopia),  and  nine  were  excluded  for  failing  one  or  more  RVTs  with  90%  specificity 
(five  failed  only  one  RVT,  four  failed  two  or  more  RVTs;  see  Table  2  for  cutoff  scores). 
Five  subjects  failed  CPT-II  Commissions,  four  subjects  failed  CPT-II  Perseverations,  two 
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subjects  failed  CPT-II  Hit  RT  SE,  two  subjects  failed  CPT-II  Omissions,  two  subjects 
failed  Digit  Span  ACSS,  and  two  subjects  failed  Digit  Span  RDS. 

Nineteen  participants  with  a  history  of  mild  TBI  met  eligibility  criteria  for 
inclusion  in  this  study.  Demographic  and  injury  information  for  the  Unbiased  Responder 
with  a  history  of  mild  TBI  (UR-mTBI)  group  is  shown  in  Table  16.  The  UR-mTBI  group 
did  not  differ  significantly  from  either  the  BR  or  UR  group  on  age,  years  of  fonnal 
education,  estimated  premorbid  intelligence,  gender,  or  race/ethnicity.  The  median  length 
of  time  since  head  injury  in  the  UR-mTBI  group  was  6.9  years  (IQR:  2.32  -  21.6  years). 
The  average  length  of  unconsciousness  was  3.00  minutes  ( SD  =  4.29  minutes),  and  the 
average  length  of  post-traumatic  amnesia  was  18.1  minutes  (SD  =  50.8  minutes). 

A  series  of  one-way  ANOVAs  were  used  to  test  for  BEAM  and  embedded  RVT 
differences  among  BR,  UR,  and  UR-mTBI  groups.  To  reduce  the  probability  of  Type  I 
error,  only  normally  distributed  variables  with  AUC  >  0.9  were  considered  for  these 
analyses.  Additionally,  a  Bonferroni  correction  was  applied  so  all  effects  are  reported  at 
a  .003  level  of  significance.  As  shown  in  Table  17,  significant  group  differences  were 
found  for  three  embedded  RVT  variables:  ACSS,  F( 2,  66)  =  38.4,/?  <  .001;  RDS,  F( 2, 
66)  =  32.0,  p  <  .001;  and  CPT-II  Commissions,  F( 2,  66)  =  29.7,/?  <  .001.  Significant 
group  differences  were  also  found  for  seven  BEAM  variables:  SacRT-IIV-DC,  F( 2,  66) 

=  36.6,/?  <  .001,  SacRT-IIV-UCG,  F( 2,  65)  =  26.6,  p  <  .001,  SacRT-IIV-Overall,  F( 2, 

66)  =  47.0,/?  <  .001,  ManRT-IIV -DC ,  F( 2,  65)  =  54.6,/?  <  .001,  ManRT-IIV-NDC,  F( 2, 
65)  =  41.0,/?  <  .001,  ManRT-IIV-MDC,  F( 2,  60)  =  33.6,/?  <  .001,  and  ManRT-IIV- 
Overall,  F( 2,  65)  =  53.0,/?  <  .001.  Because  Levene’s  test  indicated  unequal  variances  for 
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SacRT-IIV-MDC,  F( 2,  66)  =  4.37,  p  =  .02,  Welch’s  F  test  was  used  and  significant  group 
differences  were  found,  Fw( 2,  37.7)  =  26.0 ,p<  .001. 

Post-hoc  Tukey  HSD  tests  using  a  .05  level  of  significance  identified  BR  groups 
as  having  significantly  higher  BEAM  RT  variability  and  CPT-II  Commissions  than  UR 
and  UR-mTBI  groups  (Cohen’s  d  range:  1.35  -  2.68).  Furthermore,  the  BR  group  had 
significantly  lower  WAIS-IV  Digit  Span  ACSS  and  RDS  scores  than  the  UR  and  UR- 
mTBI  groups  (Cohen’s  d  range:  1.94  -  2.23).  The  UR  and  UR-mTBI  groups  did  not 
significantly  differ  on  BEAM  RT  variability,  ACSS,  RDS,  or  CPT-II  Commissions. 

The  eight  non-normally  distributed  BEAM  and  embedded  RVT  variables  with 
AUC  >  0.9  were  subjected  to  Kruskal-Wallis  tests.  As  shown  in  Table  18,  there  were 
significant  group  differences  for  four  embedded  RVT  variables:  RDS-R,  77(2)  =  36 .4,  p 
<  .001;  ARDS,  77(2)  =  32.9, p  <  .001,  CPT-II  Omissions,  77(2)  =  32.1,/?  <  .001,  and  CPT- 
II  Hit  RT  SE,  77(2)  =  27.5,  p  <  .001.  As  shown  in  Table  18,  additional  Kruskal-Wallis 
tests  identified  significant  group  differences  among  four  BEAM  variables:  SacCom%, 
77(2)  =  33.6, p  <  .001;  ManOm%,  77(2)  =  38.1  ,p<  .001;  ManRT-IIV-UCG,  77(2)  =  26.5, 
p  <  .001,  and  ManRT-IIV-UC,  77(2)  =  26.8 ,p  <  .001.  Post-hoc  Mann  Whitney  U  tests 
revealed  no  significant  differences  between  UR  and  UR-mTBI  groups  on  any  of  the  eight 
non-normally  distributed  BEAM  or  embedded  RVT  variables.  However,  significant 
differences  with  medium-to-large  effect  sizes  (r  range:  .54  -  .71)  were  found  for  all  eight 
variables  between  the  UR-mTBI  and  BR  groups.  Classification  accuracy  statistics  with 
BR,  UR,  and  UR-mTBI  groups  are  shown  in  Tables  19  and  20.  Based  on  these  results, 
both  Hypothesis  3A  and  3B  were  confirmed. 
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CHAPTER  4:  Discussion 


Summary  and  Integration  of  Results 

This  dissertation  project  sought  to  evaluate  a  novel  eye  tracking  tool’s  ability  to 

detect  invalid  responding  in  neurocognitive  assessment  relative  to  existing  response 
validity  tests  (RVTs)  and  metrics.  A  well-controlled,  prospective,  experimental  simulator 
study  with  sufficient  sample  size  (n  =  50)  to  power  analyses  was  conducted  to  identify 
variables  from  the  eye  tracking  tool  as  well  as  embedded  and  freestanding  RVTs  that 
most  accurately  classified  simulators  and  controls.  Results  obtained  from  the  simulator 
study  were  compared  to  parent  study  data  of  research  participants  with  a  history  of  mild 
TBI  in  order  to  cross-validate  experimental  results  with  a  clinical  sample  of  interest. 

The  first  aim  of  the  project  was  to  identify  variables  from  the  Bethesda  Eye  & 
Attention  Measure  (BEAM)  that  demonstrated  the  best  classification  accuracy  among 
biased  and  unbiased  responders.  Twenty-five  of  the  twenty-nine  BEAM  variables 
submitted  to  ROC  analyses  demonstrated  acceptable-to-outstanding  (AUC:  0.71  -  0.97) 
classification  accuracy:  ten  reaction  time  (RT)  variables,  twelve  reaction  time  intra¬ 
individual  variability  (RT-IIV)  variables,  two  commission  error  variables  (Com%),  and 
one  omission  error  variable  (Om%).  Ten  RT-IIV  variables,  SacCom%,  and  ManOm% 
demonstrated  outstanding  classification  accuracy  (AUC  >  0.9).  Saccadic  and  Manual 
RT-IIV-Overall  demonstrated  the  best  classification  accuracy  among  BEAM  variables 
(AUC  =  0.97).  Between-groups  analyses  of  the  twelve  BEAM  variables  with  outstanding 
classification  accuracy  identified  significant  (p  <  .001)  differences  between  the  biased 
and  unbiased  responding  groups  with  large  effect  sizes.  Compared  to  the  unbiased 
responding  (UR)  control  group,  the  biased  responding  (BR  group)  demonstrated  slower 
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saccadic  and  manual  reaction  time;  greater  saccadic  and  manual  reaction  time  intra¬ 
individual  variability;  more  saccadic  and  manual  commission  errors;  and  more  manual 
omission  errors.  The  variable  that  represented  compliance  with  test  instructions  failed  to 
achieve  acceptable  classification  accuracy.  As  such.  Hypotheses  1C  and  ID  were 
confirmed.  Hypotheses  IB  and  IE  were  partially  confirmed,  and  Hypothesis  1A  was  not 
confirmed. 

The  second  aim  of  the  project  was  to  detennine  incremental  predictive  value  of 
BEAM  variables  above  and  beyond  existing  indices  and  tests  used  to  identify  invalid 
responding.  After  taking  the  BEAM,  simulator  study  participants  completed  the  Trail 
Making  Test  (TMT)  A  &  B,  Conners’  Continuous  Performance  Test-Second  Edition 
(CPT-II),  and  the  WAIS-IV  Digit  Span  Subtest.  Each  of  these  three  tests  contains  well- 
researched  embedded  validity  metrics  that  can  be  used  to  detect  invalid  responding. 
Thirteen  of  the  eighteen  embedded  validity  metrics  from  these  tests  demonstrated 
acceptable-to-outstanding  (AUC:  0.74  -  0.94)  classification  accuracy:  seven  from  the 
CPT-II,  four  from  the  WAIS-IV  Digit  Span,  and  two  from  TMT  A  &  B.  Seven  of  these 
variables — CPT-II  Commissions,  Omissions,  and  Hit  RT  SE  and  WAIS-IV  Digit  Span 
RDS,  RDS-R,  ACSS,  and  ARDS — demonstrated  outstanding  classification  accuracy  in 
the  simulator  study  sample.  RDS-R  was  found  to  have  the  best  overall  classification 
accuracy  among  the  embedded  RVT  variables  (AUC  =  0.94).  Between-groups  analyses 
of  embedded  RVT  variables  with  outstanding  classification  accuracy  identified 
significant  {p  <  .001)  differences  between  the  biased  and  unbiased  responding  groups 
with  large  effect  sizes. 
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Simulator  study  participants  also  took  two  freestanding  RVTs — the  Victoria 
Symptom  Validity  Test  (VSVT)  and  the  Medical  Symptom  Validity  Test  (MSVT) — as 
part  of  their  assessment  battery.  All  fifteen  freestanding  RVT  variables  submitted  to 
ROC  analyses  demonstrated  excellent-to-outstanding  (AUC:  0.85  -  1.00)  classification 
accuracy  in  the  simulator  study  sample:  nine  from  the  VSVT  and  six  from  the  MSVT. 
Twelve  of  these  variables  demonstrated  outstanding  classification  accuracy.  Two  VSVT 
variables — Difficult  Correct  and  Total  Correct — demonstrated  perfect  classification 
accuracy  and  could  not  be  submitted  to  the  planned  hierarchical  logistic  regression 
models  in  SPSS24.  As  such,  the  MSVT  Free  Recall  (FR)  %  Correct  variable  was  found  to 
have  the  best  overall  classification  accuracy  among  the  freestanding  RVT  variables 
pennissible  for  logistic  regression  analyses.  Similar  to  BEAM  and  embedded  RVT 
findings,  between-groups  analyses  of  freestanding  RVT  variables  with  outstanding 
classification  accuracy  identified  significant  (p  <  .001)  differences  between  the  biased 
and  unbiased  responding  groups  with  large  effect  sizes. 

Hierarchical  logistic  regressions  using  the  best  and  most  representative  variables 
from  the  BEAM  (SacRT-IIV-Overall/ManRT-IIV-Overall),  embedded  RVTs  (RDS-R), 
and  freestanding  RVTs  (MSVT  FR)  were  conducted.  The  best  BEAM  variable 
outperformed  the  best  embedded  RVT  variable,  with  SacRT-IIV-Overall  demonstrating 
incremental  predictive  ability  above  and  beyond  RDS-R.  As  such,  Hypothesis  2A  was 
confirmed.  The  best  BEAM  variable  outperformed  the  third  best  freestanding  RVT 
variable,  with  ManRT-IIV-Overall  demonstrating  incremental  predictive  value  above  and 

24  It  is  impossible  to  improve  predictive  value  of  a  model  above-and-beyond  perfect  group 
classification.  While  these  variables  were  clearly  the  best  and  most  representative  freestanding  RVT 
variables,  they  could  not  be  used  to  test  Hypotheses  2B  and  2C. 

25  That  could  be  submitted  to  the  logistic  regression  analyses 
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beyond  MSVT  FR.  As  such,  Hypothesis  2B  was  partially  confirmed.  In  a  third  model, 
MSVT  FR  demonstrated  incremental  predictive  ability  above  and  beyond  RDS-R,  but  the 
addition  of  either  SacRT-IIV-Overall  or  ManRT-IIV-Overall  to  the  model  rendered 
perfect  classification  accuracy  and  overfit  the  model.  As  such,  Hypothesis  2C  was 
partially  confirmed. 

The  third  and  final  aim  of  the  project  was  to  compare  BEAM  and  embedded  RVT 
perfonnance  between  simulator  study  participants  and  unbiased  responders  with  a  history 
of  mild  TBI  (UR-mTBI).  Variables  that  demonstrated  outstanding  classification 
accuracy  in  the  simulator  study  were  submitted  to  between-groups  analyses  and  post -hoc 
tests.  None  of  the  nineteen  BEAM  and  embedded  RVT  variables  with  outstanding 
classification  accuracy  differed  significantly  (p  <  .05)  between  the  UR  and  UR-mTBI 
groups,  but  all  nineteen  variables  differed  significantly  (p  <  .05)  between  the  BR  and  UR- 
mTBI  groups  with  medium-to-large  effect  sizes.  The  BR  group  consistently 
demonstrated  greater  saccadic  (BEAM)  and  manual  (BEAM,  CPT-II)  reaction  time  intra¬ 
individual  variability;  more  saccadic  (BEAM)  and  manual  (BEAM,  CPT-II)  commission 
errors;  more  manual  (BEAM,  CPT-II)  omission  errors;  and  poorer  WAIS-IV  Digit  Span 
performance  than  the  UR-mTBI  group.  As  such,  Hypotheses  3A  and  3B  were  confirmed. 

Explanations  for  Findings 

Overall,  the  prospective,  experimental  simulator  study  produced  a  higher  than 
initially  expected  number  of  BEAM,  embedded  RVT,  and  freestanding  RVT  variables 
with  acceptable-to-outstanding  classification  accuracy  for  invalid  responding. 

Specifically,  53  of  62  variables  (85.5%)  achieved  the  study’s  baseline  for  inclusion  in 
subsequent  analyses.  As  a  result  of  these  initial  findings,  only  variables  with  outstanding 
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classification  accuracy  (AUC  >  0.9)  in  the  prospective,  experimental  design  were 
submitted  to  additional  analyses.  Of  the  62  original  variables,  31  (50.0%)  remained  after 
applying  this  new  cutoff:  12  BEAM  variables,  7  embedded  RVT  variables,  and  12 
freestanding  RVT  variables. 

Because  this  study  was  the  first  to  evaluate  the  ability  of  a  novel  eye  tracking 
tool — the  BEAM — to  identify  invalid  responding,  all  29  BEAM  variables  derived  from 
the  custom-built  scoring  software  were  originally  submitted  to  ROC  analyses.  This 
approach  identified  several  saccadic  and  manual  metrics  that  were  able  to  detect  invalid 
responding  in  a  sample  of  healthy  participants.  The  primary  research  question  of  the 
dissertation  project  (“Can  eye  movements  be  used  to  detect  invalid  responding?”)  was 
confirmed. 

Interestingly,  saccadic  and  manual  variables  exhibited  several  interesting  trends  in 
how  well  they  classified  invalid  responding.  In  both  saccadic  and  manual  modalities, 
reaction  time  intra-individual  variability  (RT-IIV)  metrics  generally  outperformed 
reaction  time  (RT)  metrics  in  classifying  invalid  responding.  RT-IIV  metrics  ranged 
from  excellent-to-outstanding  (AUC:  0.85  -  0.97),  whereas  RT  metrics  ranged  from 
unacceptable-to-excellent  (AUC:  0.60  -  0.89).  Ten  of  the  twelve  BEAM  variables  with 
outstanding  classification  accuracy  were  RT-IIV  metrics:  four  SacRT-IIV  variables  and 
six  ManRT-IIV  variables. 

Qualitatively,  Manual  RT  and  RT-IIV  metrics  appeared  to  outperfonn  their 
respective  Saccadic  RT  and  RT-IIV  metrics  in  detecting  invalid  responding.  ManRT-IIV 
variables  were  strong  among  the  five  non-DCR  trial  types  (AUC:  0.92  -  0.97)  and 
overall  variability  (AUC  =  0.97,  95%  CI\  0.92  -  1.00).  Four  of  the  five  BEAM  variables 
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with  the  highest  AUC  were  ManRT-IIV  metrics  with  95%  AUC  confidence  intervals 
between  0.90-1.00.  These  results  could  be  explained  by  the  greater  reliability  among 
manual  metrics  compared  to  saccadic  metrics  (18).  While  SacRT-IIV  variables  were 
relatively  weaker  among  the  non-DCR  trial  types  (AUC:  0.85  -  0.93),  SacRT-IIV - 
Overall  (AUC  =  0.97,  95%  CL  0.93  -  1.00)  emerged  as  one  of  the  two  best  BEAM 
variables  for  identifying  invalid  responding  in  the  simulator  study.  It  is  possible  that  the 
way  SacRT-IIV -Overall  is  calculated  (i.e.,  average  of  trial  type  standard  deviations) 
enhances  its  internal  reliability  and  contributes  to  its  superior  performance  relative  to  the 
other  saccadic  metrics.  It  is  also  possible  that  the  more  automatic  nature  of  saccadic 
responses  to  visual  stimuli  may  render  them  less  vulnerable  to  invalid  responding,  an 
interpretation  supported  by  previous  research  (19;  117).  Based  on  this  initial  study,  it  is 
clear  that  both  modalities  offered  metrics  that  were  able  to  identify  invalid  responding 
with  outstanding  classification  accuracy.  Additional  studies  examining  both  the 
complimentary  and  the  independent  contributions  that  saccadic  and  manual  metrics  each 
provide  to  the  evaluation  of  cognitive  performance  would  help  explain  the  findings  in  this 
study. 

Interestingly,  BEAM  commission  and  omission  errors  manifested  differently 
between  saccadic  and  manual  responses.  Both  SacCom%  and  ManOm%  demonstrated 
outstanding  classification  accuracy.  By  comparison,  ManCom%  had  AUC  =  0.80  and 
SacOm%  had  AUC  =  0.60.  While  commission  errors  (i.e.,  looking  or  pressing  the  button 
during  the  inhibition  trials)  appeared  to  have  some  utility  in  both  response  modalities, 
saccadic  commissions  appeared  to  be  more  sensitive  to  invalid  responding  than  manual 
commissions. 
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Conversely,  Manual  Omission  Errors  were  much  more  likely  to  occur  in  the  BR 
group  than  Saccadic  Omission  Errors.  While  it  was  rare  in  both  biased  and  unbiased 
groups  for  participants  to  omit  any  saccadic  response  after  a  target  circle  appeared, 
manual  omission  errors  were  significantly  more  likely  to  occur  in  the  BR  group.  It  is 
possible  this  finding  was  influenced  by  the  scoring  software  defines  omission  errors  (i.e., 
no  response  in  1000ms  window  on  non-inhibition  trials).  Any  button  press  occurring 
more  than  1000ms  after  the  target  circle  appeared  would  be  counted  as  a  manual 
omission,  just  as  if  no  button  press  occurred  at  all.  Given  that  manual  reaction  time  is 
consistently  longer  than  saccadic  reaction  time  on  the  BEAM  (18),  omission  errors  may 
have  been  more  likely  to  be  registered  in  the  manual  modality.  Still,  it  was  surprising  to 
see  such  a  disparity  between  modalities,  with  ManOm%  classification  accuracy  being  in 
the  outstanding  range  and  SacOm%  being  in  the  unacceptable  range. 

Several  findings  among  the  embedded  and  freestanding  RVTs  met  or  exceeded 
expectations  of  their  classification  accuracy.  Consistent  with  previous  literature  (42;  234; 
269),  the  WAIS-IV  Digit  Span  and  CPT-II  demonstrated  better  capability  to  detect 
invalid  responding  than  the  Trail  Making  Test  A  &  B.  Not  surprisingly,  all  fifteen 
variables  from  the  VSVT  and  MSVT — tests  that  were  designed  to  detect  invalid 
responding — demonstrated  excellent-to-outstanding  classification  accuracy.  Consistent 
with  the  literature  (248),  the  VSVT  Difficult  Correct  (and  by  extension,  Total  Correct) 
variable  obtained  the  best  overall  classification  accuracy  in  the  simulator  study. 

Classification  accuracy  statistics  for  BEAM,  embedded  RVT,  and  freestanding 
RVT  cutoff  scores  were  strong  across  the  three  test  types  (i.e.,  BEAM,  embedded  RVTs, 
freestanding  RVTs).  Using  Larrabee’s  recommended  0.85  minimum  specificity  level  for 
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detecting  invalid  responding  (153;  154),  BEAM  variables  demonstrated  sensitivity  levels 
ranging  from  0.74  -  .0.96.  Using  the  same  specificity  criterion,  embedded  RVT  variable 
sensitivities  ranged  from  0.83  -  0.96  and  freestanding  RVT  variable  sensitivities  ranged 
from  0.75  -  1.00.  Collectively,  all  31  variables  with  outstanding  classification  accuracy 
among  the  three  test  types  were  able  to  detect  roughly  three  quarters  of  the  biased 
responders  while  maintaining  85%  specificity.  It  should  also  be  noted  that  the  base  rate 
of  invalid  responding  in  the  simulator  study  was  48%  (24  out  of  50  were  biased 
responders).  While  this  experimental  base  rate  approximates  actual  base  rates  in  real- 
world  clinical  contexts  (see  Base  Rates  section  above),  smaller  base  rates  (in  the  10-20% 
range)  would  render  smaller  PPVs  and  higher  NPVs  than  reported  in  Table  7. 

As  intended,  the  well-controlled,  prospective  experimental  design  rendered 
significant  differences  in  neurocognitive  assessment  perfonnance  between  the  biased  and 
unbiased  responders.  The  randomized,  permuted  block  assignment  with  a  substantial 
sample  size  (n  =  50)  rendered  demographically  similar  groups.  Examiner  scoring  bias 
was  mitigated  by  blinding  and  computer-automated,  objective  data  collection.  Consistent 
with  Sollman  and  Berry’s  (248)  meta-analytic  findings,  providing  participants  with  a 
warning  to  fake  believably  achieved  a  large  performance  disparity  between  biased  and 
unbiased  responders.  Collectively,  these  findings  suggest  the  research  design  effectively 
isolated  the  “invalid  responding”  construct,  enhancing  the  internal  validity  of  the 
simulator  study  results. 

As  internal  validity  increases,  external  validity  usually  decreases.  Not 
surprisingly,  the  embedded  and  freestanding  RVT  variables’  classification  accuracies 
were  generally  higher  in  the  simulator  study  than  previous  studies  using  clinical 
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populations.  Multiple  studies  have  reported  CPT-II  Commissions,  Omissions,  and  Hit 
RT  SE  variables  having  point  AUCs  in  the  acceptable  range  (0.7  <  AUC  <  0.8)  among 
clinical  referrals  for  neuropsychological  testing  (42;  149;  198).  In  a  retrospective  study 
of  veterans  referred  for  neuropsychological  assessment,  Young  and  colleagues  (284) 
reported  95%  AUC  confidence  intervals  in  the  unacceptable-to-acceptable  (0.6  <  AUC  < 
0.8)  range  for  WAIS-IV  Digit  Span  ACSS,  RDS,  and  RDS-R  variables.  Similarly,  Busse 
and  Whiteside  (42)  reported  TMT  A  having  unacceptable  (AUC  =  0.62)  classification 
accuracy  while  TMT  B  demonstrated  acceptable  (AUC  =  0.76)  classification  accuracy 
among  a  sample  of  413  consecutive  neuropsychological  referrals. 

The  relatively  higher  AUCs  in  this  project’s  simulator  study  may  be  explained  by 
the  non-clinical  subject  pool  and  well-controlled  experimental  manipulation  of  invalid 
responding.  Consistent  with  the  UR-mTBI  group’s  somewhat  poorer  neurocognitive 
perfonnance  relative  to  the  UR  group,  the  AUCs  for  BEAM  and  embedded  RVT 
variables  were  somewhat  lower  when  comparing  BR  and  UR-mTBI  groups. 
Classification  accuracy  statistics  for  BEAM,  embedded  RVT,  and  freestanding  RVT 
variables  obtained  in  the  simulator  study  would  likely  be  lower  in  more  heterogeneous, 
clinical  samples. 

The  simulator  study’s  statistical  methodology  utilized  a  two-prong  approach  to 
identifying  the  best  detectors  of  invalid  responding  among  each  test  type.  First,  ROC 
analyses  were  used  in  an  exploratory  manner  to  reduce  a  large  number  of  variables  into  a 
more  manageable  subset.  These  ROC  curves  provided  a  cross-sectional  snapshot  of 
overall  perfonnance.  Next,  stepwise  logistic  regressions  were  perfonned  on  this  reduced 
set  of  variables  to  more  systematically  compare  their  group  classification  prediction 
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capabilities.  The  logistic  regressions  were  able  to  distill  the  maximum  benefit  from 
multiple  scores  and  indices,  making  the  technique  particularly  useful  for  identifying  new 
embedded  response  validity  indices  in  the  BEAM  (181).  The  logistic  regression  models 
accounted  for  fluctuating  responding  behavior  throughout  the  battery  by  including 
variables  from  multiple  measures  obtained  at  several  time  points  (36;  234).  Results  from 
stepwise  regression  analyses  were  compared  with  joint  variable  ROC  analyses  (i.e.,  CPT- 
II  Commissions  +  WAIS-IV  Digit  Span  RDS-R)  to  identify  the  most  representative 
variables  from  each  test  type. 

The  ROC  and  stepwise  logistic  regression  analyses  independently  identified  the 
same  variables  from  each  test  type  as  being  the  best  predictors  of  invalid  responding: 
SacRT -IIV -Overall  (AUC  =  0.97)  and  ManRT-IIV-Overall  (AUC  =  0.97)  from  the 
BEAM;  WAIS-IV  Digit  Span  RDS-R  (AUC  =  0.94)  from  embedded  RVTs;  and  VSVT 
Difficult  Correct  (AUC  =  1.00),  VSVT  Total  Correct  (AUC  =  1.00),  and  MSVT  FR 
(AUC  =  0.96)  from  freestanding  RVTs.  The  best  variables  identified  in  Aim  1  of  this 
study,  as  shown  in  Table  4,  were  the  same  variables  that  were  identified  by  a  series  of 
stepwise  logistic  regression  models.  These  convergent  findings  from  separate  statistical 
methodologies  enhanced  the  confidence  that  the  variables  submitted  to  hierarchical 
regression  analyses  would  best  represent  their  respective  test  types.  Of  note,  both  a 
saccadic  and  manual  metric  from  the  BEAM  could  independently  be  used  to  represent  the 
BEAM. 

Earlier  in  the  study,  eye  movement  metrics  were  shown  to  be  useful  to  detecting 
invalid  responding;  hierarchical  logistic  regressions  were  then  used  to  see  how  useful 
they  were  when  compared  to  existing  measures  (the  second  part  of  this  study’s  primary 
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research  question).  Analyses  revealed  that  the  BEAM  demonstrated  considerable 
predictive  ability  of  invalid  responding  above  and  beyond  embedded  RVTs.  While  two 
freestanding  RVT  variables  with  perfect  classification  accuracy  outperfonned  the  BEAM 
in  terms  of  AUC,  it  could  not  be  detennined  if  VSVT  Difficult/Total  Items  Correct 
significantly  improved  group  prediction  above  and  beyond  the  BEAM.  It  was 
encouraging  nonetheless  to  find  that  the  BEAM  demonstrated  moderate  predictive  ability 
of  invalid  responding  above  and  beyond  all  other  freestanding  RVT  variables,  including 
all  MSVT  variables. 

Consistent  with  the  literature  (179),  freestanding  RVTs  (not  including  VSVT 
Difficult/Total  Correct  variables)  demonstrated  incremental  predictive  value  above  and 
beyond  embedded  RVTs  in  the  simulator  study.  The  addition  of  a  either  a  saccadic  or 
manual  BEAM  variable  in  this  Embedded  RVT  +  Freestanding  RVT  model  achieved 
perfect  classification  accuracy.  While  this  finding  indicates  an  objective  improvement  in 
group  prediction,  limitations  of  the  statistical  methodology  preclude  understanding  as  to 
whether  the  BEAM  significantly  added  to  the  Embedded  RVT  +  Freestanding  RVT 
model. 

Collectively,  the  prospective  simulator  study  results  provide  compelling  evidence 
that  the  BEAM’S  saccadic  and  manual  metrics  may  serve  as  powerful  tools  for  detecting 
invalid  responding.  The  results  suggest  that  the  BEAM’S  saccadic  and  manual  metrics 
could  be  used  to  detect  invalid  responding  similar  to  the  well-researched  embedded  RVT 
measures  in  the  CPT-II,  TMT,  and  WAIS-IV.  The  simulator  study  results  also  provide 
preliminary  evidence  that  BEAM  metrics  may  perform  comparably  to  the  MSVT  and 
VSVT  at  detecting  invalid  responding. 
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To  enhance  generalizability  of  the  simulator  study’s  findings,  the  experimental 
study  results  needed  to  be  compared  to  a  group  with  a  known  clinical  condition.  Mild 
TBI  populations  are  known  for  having  high  rates  of  invalid  responding  (44;  100;  183), 
and  the  parent  study  enabled  BEAM  and  embedded  RVT  variables  to  be  compared  to  a 
group  with  this  condition.  Conservative  screening  methods  increased  confidence  that  the 
mild  TBI  participants  were  perfonning  at  capacity  levels  (i.e.,  valid  responding). 

Because  the  clinical  comparison  group  did  not  differ  demographically  from  the  UR  or  BR 
groups,  group  analyses  were  able  to  be  conducted  without  controlling  for  demographic 
variables  and  limiting  statistical  power.  While  freestanding  RVT  comparison  data  were 
unavailable,  analyses  were  able  to  be  conducted  with  the  19  BEAM  and  embedded  RVT 
variables  that  demonstrated  outstanding  classification  accuracy. 

While  the  UR-mTBI  group  had  somewhat  poorer  BEAM  and  embedded  RVT 
scores  than  the  UR  group,  the  perfonnance  differences  were  not  significant.  These  null 
findings  could  be  explained  by  the  UR-mTBI  group’s  median  time  since  injury  of  6.9 
years  and  an  above-average  premorbid  intelligence.  It  is  possible  that  any  effects  of 
neurological  injury  at  time  of  testing  were  attenuated  by  time,  cognitive  ability,  or 
behavioral  factors  that  limited  the  ability  of  conventional  neuropsychological  testing  to 
identify  significant  impairment  (72).  It  is  also  possible  that  this  study  was  underpowered 
to  detect  significant  differences  in  neurocognitive  perfonnance  among  persons  with  mild 
TBI  and  healthy  controls.  Nonetheless,  the  UR-mTBI  group  served  a  useful  purpose  in 
this  study  by  providing  a  clinical  comparison  group  to  healthy  groups  of  biased  and 
unbiased  responders.  In  contrast  to  the  UR/UR-mTBI  group  comparisons,  the  BR  group 
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performed  significantly  worse  than  the  UR-mTBI  group  on  all  19  BEAM,  CPT-II,  and 
WAIS-IV  Digit  Span  variables  with  medium-to-large  effect  sizes. 

Further  analyses  of  cutoff  scores  in  the  clinical  sample  enhanced  the 
generalizability  of  the  simulator  study’s  findings.  BEAM  variables  generally 
demonstrated  excellent  sensitivity  (0.61  -  0.92)  to  invalid  responding  while  maintaining 
0.85  or  greater  specificity  in  the  UR-mTBI  group.  This  clinical  sensitivity  range  for 
BEAM  variables  is  comparable  to  the  0.74  -  0.96  range  obtained  in  the  simulator  study, 
suggesting  that  the  BEAM’S  invalid  responding  detection  performance  may  not  be 
significantly  degraded  in  validly  responding  mild  TBI  populations.  By  contrast,  the 
WAIS-IV  Digit  Span  and  CPT-II  variables  demonstrated  sensitivity  levels  ranging  from 
0.50  -  0.75  with  specificity  levels  of  0.85  or  above  in  the  UR-mTBI  group.  This  clinical 
sensitivity  range  for  embedded  RVT  variables  is  much  lower  than  the  0.83  -  0.96 
sensitivity  range  obtained  in  the  simulator  study  and  more  consistent  with  previous 
embedded  RVT  research  with  clinical  samples  (42;  149;  198;  284). 

Despite  not  having  significant  group  differences  between  the  UR  and  UR-mTBI 
group  on  BEAM,  CPT-II,  and  WAIS-IV  Digit  Span  performance,  it  was  not  surprising  to 
see  cutoff  scores  with  specificity  on  or  about  0.85  in  the  control  group  result  in  lower 
specificity  in  the  clinical  group.  This  lowered  clinical  specificity  resulted  in  higher 
numbers  of  false  positives  across  several  BEAM  and  embedded  RVT  variables,  including 
ones  identified  as  being  the  best  and  most  representative  variables  in  the  simulator  study. 
SacRT-IIV-Overall  scores  of  0.108  sec  or  greater  had  88%  sensitivity  to  invalid 
responding  and  85%  specificity  in  the  UR  group,  but  the  same  cutoff  score  demonstrated 
63%  specificity  in  the  UR-mTBI  group.  ManRT-IIV-Overall  scores  of  0.092  sec  or 
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greater  had  96%  sensitivity  and  85%  specificity  in  the  UR  group,  but  also  demonstrated 
63%  specificity  in  the  UR-mTBI  group.  WAIS-IV  Digit  Span  RDS-R  scores  of  15  or 
less  had  92%  sensitivity  and  88%  specificity  in  the  UR  group,  but  demonstrated  63% 
specificity  in  the  UR-mTBI  group. 

Given  that  UR-mTBI  group  cutoff  scores  were  generally  less  specific  to  invalid 
responding  than  those  in  the  UR  group,  it  was  surprising  to  find  that  some  BEAM  and 
embedded  RVT  variables  produced  cutoff  scores  with  similar  specificities  among  clinical 
and  control  groups.  Five  variables  with  outstanding  classification  accuracy  in  the 
simulator  study  had  cutoff  scores  with  similar  ~  15%  false  positive  rates  among  the 
clinical  and  healthy  control  groups:  SacRT-IIV-DC  (>0.104  sec  cutoff  score:  79% 
sensitivity  to  BR,  84%  UR-mTBI  specificity,  85%  UR  specificity),  ManRT-IIV-MDC 
(>0.  Ill  sec  cutoff  score:  83%  sensitivity  to  BR,  84%  UR-mTBI  specificity,  85%  UR 
specificity),  and  ManOm%  (>1.0%  cutoff:  92%  sensitivity  to  BR,  89%  UR-mTBI 
specificity,  88%  UR  specificity),  WAIS-IV  Digit  Span  RDS  (<9  cutoff:  92%  sensitivity 
to  BR,  84%  UR-mTBI  specificity,  88%  UR  specificity),  and  CPT-II  Omissions  (>3  raw 
cutoff:  71%  sensitivity  to  BR,  84%  UR-mTBI  specificity,  85%  UR  specificity). 

Several  BEAM  and  embedded  RVT  variable  cutoff  scores  achieved  similar  ~5% 
false  positive  rates  between  the  control  and  clinical  groups.  SacRT-IIV-DC  (>0.125  sec 
cutoff  score:  79%  sensitivity  to  BR,  95%  UR-mTBI  specificity,  96%  UR  specificity)  and 
ManRT-IIV-DC  (>0.123  sec  cutoff  score:  83%  sensitivity  to  BR,  95%  UR-mTBI 
specificity,  96%  UR  specificity)  demonstrated  good  sensitivity  to  invalid  responding  with 
high  clinical  and  control  group  specificity.  One  representative  BEAM  variable,  ManRT- 
HV-Overall,  perfonned  much  better  across  clinical  and  control  groups  when  using  a 


130 


cutoff  score  with  higher  specificity  (>0.1 18  sec  cutoff  score:  78%  sensitivity  to  BR,  95% 
UR-mTBI  specificity,  96%  UR  specificity).  ManOm%  (>8.2%  cutoff  score:  54% 
sensitivity  to  BR,  95%  UR-mTBI  specificity,  96%  UR  specificity)  and  ManRT-IIV-UC 
(>0. 122  sec  cutoff  score:  57%  sensitivity  to  BR,  95%  UR-mTBI  specificity,  96%  UR 
specificity)  demonstrated  relatively  lower  sensitivity  than  other  BEAM  variables  with 
similar  -5%  false  positive  rates  among  clinical  and  control  groups.  Among  embedded 
RVT  variables,  WAIS-IV  Digit  Span  ACSS  (<8  cutoff  score:  71%  sensitivity  to  BR, 

95%  UR-mTBI  specificity,  96%  UR  specificity),  RDS  (<8  cutoff  score:  67%  sensitivity 
to  BR,  95%  UR-mTBI  specificity,  96%  UR  specificity),  and  RDS-R  (<13  sec  cutoff 
score:  75%  sensitivity  to  BR,  95%  UR-mTBI  specificity,  96%  UR  specificity) 
demonstrated  varying  sensitivity  while  maintaining  similar  -5%  false  positive  rates 
among  clinical  and  control  groups. 

Among  BEAM  variables,  SacRT-IIV-DC,  ManRT-IIV-DC,  ManRT-IIV-UC, 
ManRT-IIV-Overall,  and  ManOm%  demonstrated  similar  -5%  false  positive  rates  among 
clinical  and  control  groups,  as  did  SacRT-IIV-DC,  ManRT-IIV-MDC,  and  ManOm%  at 
the  -15%  level.  Among  embedded  RVT  variables,  CPT-II  Omissions,  WAIS-IV  Digit 
Span  ACSS,  and  WAIS-IV  Digit  Span  RDS-R  demonstrated  similar  -5%  false  positive 
rates  among  clinical  and  control  groups,  as  did  CPT-II  Omissions  and  WAIS-IV  Digit 
Span  RDS  at  the  15%  level.  SacRT-IIV-DC,  ManOm%,  and  WAIS-IV  Digit  Span  RDS 
variables  performed  similarly  in  clinical  and  control  groups  at  both  -5%  and  -15%  false 
positive  levels.  Several  BEAM  variables  representing  divergent  metrics  (i.e.,  saccadic 
and  manual  reaction  time  intra-individual  variability,  manual  omission  errors) 
demonstrated  high  sensitivity  (0.78  -  0.92)  to  invalid  responding  while  minimizing  false 
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positives  (-15%)  in  both  clinical  and  control  groups  (See  Figures  3-5).  These  results 
indicate  that  certain  BEAM  variables  can  be  sensitive  and  specific  to  invalid  responding 
in  mild  TBI  populations. 

The  collective  results  from  the  simulator  study  and  initial  clinical  validation 
provide  preliminary  evidence  that  the  BEAM  could  be  used  in  neuropsychological 
evaluations  in  manners  similar  to  other  perfonnance  measures  with  well-researched 
embedded  RVTs.  If  efficiency  is  a  priority,  future  studies  may  support  using  the  BEAM 
in  lieu  of  a  lengthier  freestanding  RVT.  At  12  minutes,  the  BEAM’S  total  administration 
time  (including  calibration,  instructions,  practice,  and  actual  measure)  is  faster  than  the 
CPT-II,  MSVT,  or  VSVT.  While  the  present  costs  of  using  licensed  copies  of  the  CPT- 
II,  Digit  Span,  MSVT,  and  VSVT  in  clinical  and  research  settings  may  be  lower  than  the 
cost  of  the  prototype  eye-tracking  system  used  in  this  study  to  administer  the  BEAM, 
future  advances  in  technology  will  likely  drive  down  the  cost  of  eye-tracking  hardware. 
If  saccadic  metrics  prove  to  be  clinically  useful  in  the  assessment  of  cognitive 
perfonnance  and  response  validity,  the  benefits  of  using  eye-tracking  may  justify  the 
costs. 

Limitations 

As  with  all  studies,  the  internal  and  external  validity  of  this  dissertation  project 
was  limited  by  study  design  methodology,  statistical  analyses,  and  the  data  that  was 
available  to  answer  the  pertinent  research  questions.  The  simulator  study,  as  tightly 
controlled  as  it  was,  would  have  benefitted  from  having  a  larger  and  more  diverse 
sample.  Because  demographic  factors  can  influence  the  prevalence  and  type  of  invalid 
responding  behavior  (10),  the  above  average  premorbid  intelligence  and  years  of 
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education  in  all  three  of  this  study’s  groups  may  limit  the  generalizability  of  the  study’s 
findings  to  groups  of  lesser  intellectual  capacity.  While  it  is  entirely  possible  that  the  BR 
group’s  above-average  intelligence  may  have  produced  more  sophisticated  (and  harder- 
to-detect)  faking  than  a  sample  with  normal  or  below-average  intelligence,  this  question 
could  not  be  analyzed  without  a  larger  sample  of  participants. 

Despite  blinding  the  examiners  to  group  assignment,  it  is  possible  that  examiners 
may  have  exhibited  scoring  biases  during  data  collection.  Specifically,  they  may  have 
biased  scores  lower  or  higher,  depending  on  the  group  to  which  they  believed  the 
participant  had  been  assigned.  Because  every  test  administered  was  either  designed  to 
identify  (or  could  be  used  to  identify)  invalid  responding,  it  could  be  expected  that 
examiners  would  correctly  guess  group  assignment  for  participants  who  appeared  to 
perform  poorly  on  testing.  Not  surprisingly,  post-examination  feedback  data  indicated 
that  the  study’s  examiners  correctly  guessed  a  participant  group  assignment  96%  of  the 
time  (48/50  correct)  with  an  average  confidence  of  4.73  ( SD  =  0.62)  on  a  scale  from  1  to 
5  (with  5  being  the  most  confident).  In  some  research  or  clinical  settings,  this  result 
could  pose  a  limitation  to  the  internal  validity  of  the  results;  however,  the  risk  of 
examiner  bias  in  this  study  was  largely  mitigated  by  the  automated  scoring  software  in 
the  BEAM,  VSVT,  MSVT,  and  CPT-II,  and  the  objective  scoring  criteria  from  the  other 
tests  in  the  battery.  Further  research  exploring  examiners’  behavioral  observations  and 
participants’  response  strategies  during  response  validity  assessment  would  provide 
useful  complimentary  infonnation  to  the  quantitative  data. 

Due  to  the  differential  test  length  and  multi-part  nature  of  some  tests  (i.e.,  10 
minute  delays  before  “Part  2”  of  a  test)  in  the  neurocognitive  battery,  testing  order 
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needed  to  remain  fixed  for  all  simulator  study  participants.  Additionally,  the  UR-mTBI 
participants  took  a  different  battery  than  the  UR  and  BR  groups,  one  which  included 
several  additional  tests  and  did  not  include  the  VSVT  or  MSVT.  As  such,  it  is  possible 
test  order  effects  may  have  influenced  performance.  These  order  effects  were  somewhat 
mitigated  by  comparing  performance  on  similar  tests  from  two  separate  neurocognitive 
batteries  (UR  &  BR  to  UR-mTBI),  each  with  a  unique,  fixed  test  order.  Nonsignificant 
findings  between  UR  and  UR-mTBI  groups  coupled  with  significant  findings  between 
BR  and  UR-mTBI  groups  suggest  that  test  order  effects,  if  present,  were  negligible  in 
comparison  to  valid  or  invalid  responding. 

The  aim  of  the  study  that  sought  to  evaluate  incremental  predictive  value  of  the 
BEAM  was  limited  by  statistical  methodology  and  analysis  software.  Specifically, 
logistic  regression  models  in  SPSS  Version  20  become  overfit  when  they  obtain  perfect 
classification  accuracy.  Due  to  the  large  group  differences  in  the  simulator  study,  perfect 
classification  accuracy  was  quickly  obtained  with  only  minimal  combinations  of 
variables.  It  also  forced  this  author  to  utilize  a  single,  representative  variable  for  each  test 
type  rather  than  including  several  variables  from  the  BEAM,  embedded  RVTs,  and 
freestanding  RVTs.  While  it  was  not  possible  to  evaluate  every  metric,  the  obtained 
results  were  qualitatively  consistent  with  the  testable  ones  and  qualitatively  interpretable. 
The  logistic  regression  models  provided  useful  adjunctive  information  to  the  ROC 
analyses  in  Aim  1 . 

Ideally,  this  project  would  have  had  the  time  and  resources  to  collect  more  data 
from  persons  with  a  history  of  mild  TBI.  As  stated  previously,  a  true  combined  groups 
study  would  have  incorporated  at  least  four  groups  to  maximize  internal  and  external 
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validity:  a  biased  experimental  group  (without  a  clinical  condition),  an  unbiased 
experimental  group,  and  biased  clinical  group,  and  an  unbiased  clinical  group  (218;  249). 
While  this  study  included  three  of  these  groups,  it  was  not  possible  to  incorporate  a 
biased  clinical  group  of  participants  with  a  history  of  mild  TBI.  As  such,  our 
understanding  of  how  a  clinical  group  would  have  demonstrated  invalid  responding  on 
the  BEAM  in  comparison  to  other  groups  is  limited. 

Other  studies  using  known  groups  in  mild  TBI  samples  have  traditionally  used  a 
“gold-standard”  criterion  such  as  freestanding  RVT  like  the  WMT  (149;  151)  or  TOMM 
(42)  failure  to  split  groups  into  biased  and  unbiased  responders.  In  the  absence  of 
freestanding  RVT  data  in  the  parent  study,  multiple  embedded  RVT  indices  were  used  to 
rigorously  screen  for  invalid  responding.  Because  embedded  RVTs  are  generally  less 
sensitive  than  freestanding  RVTs  (179),  it  was  detennined  that  a  single  embedded 
validity  index  failure  with  90%  specificity  disqualified  a  subject  from  the  UR-mTBI 
group.  While  this  method  may  have  reasonably  precluded  10%  of  the  valid  responders 
with  a  history  of  mild  TBI  from  our  sample,  it  was  determined  that  the  criteria  for 
inclusion  in  the  UR-mTBI  group  needed  to  be  as  stringent  as  possible  to  enhance 
confidence  that  the  “valid  responders”  were  responding  validly. 

Finally,  it  should  be  noted  that  the  construct  being  examined  in  this  dissertation 
project  (i.e.,  “true”  invalid  responding)  is  rarely  observed  with  certainty  or 
experimentally  induced.  However,  behavior  that  approximates  true  invalid  responding 
can  be  observed  and  experimentally  induced  with  the  proper  research  design,  and  this 
approximate  behavior  can  nonetheless  serve  as  a  useful  standard  of  reference  for  future 
clinical  judgments.  Obtaining  cutoff  scores  from  controlled,  experimental  studies  of  non- 
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clinical  groups  isolates  the  effects  of  the  experimentally  manipulated  “invalid 
responding”  construct.  Results  from  these  studies  can  then  be  applied  to  clinical  groups 
to  examine  overlap  between  invalid  responding  performance  and  capacity  performance 
impacted  by  some  clinical  condition.  By  following  research  guidelines  established  by 
leaders  in  the  field  (34;  154;  218),  this  project  enhanced  its  contributions  to  the  field  of 
response  validity  research. 

Future  Directions 

This  study  was  the  first  to  examine  the  BEAM’S  ability  to  identify  invalid 
responding  in  a  neurocognitive  assessment,  and  subsequent  studies  are  needed  to  verify 
and  replicate  this  study’s  findings  in  clinical  and  non-clinical  groups  with  known  groups 
of  valid  and  invalid  responders.  The  results  of  this  study  raise  several  important 
questions  for  future  studies.  Specifically,  which  BEAM  variables  will  perfonn  the  best 
towards  discriminating  valid  from  invalid  responders  with  varying  neurocognitive 
disorders?  How  will  the  BEAM’S  ability  to  detect  invalid  responding  compare  with 
existing  embedded  and  freestanding  RVTs  in  subsequent  studies?  How  will  BEAM 
perfonnance  manifest  in  research,  clinical,  and  disability  evaluation  contexts  within  the 
DoD/VA?  How  will  the  BEAM’S  performance  be  impacted  by  fatigue,  depression, 
anxiety,  chronic  pain,  and  other  factors  that  influence  test-taking  ability? 

The  BEAM’S  concurrent,  dual-response  modality  (where  participants  look  at 
visual  stimuli  and  press  a  button  in  response  to  a  single  target)  may  elicit  wide-ranging 
invalid  responding  strategies,  and  these  strategies  should  be  examined  in  greater  detail. 
The  wide  net  of  conceptually  unique  BEAM  variables  that  obtained  outstanding 
classification  accuracy  in  this  study — saccadic  and  manual  metrics,  commission  and 
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omission  errors — suggests  that  a  diagnostic  algorithm  could  be  developed  for  invalid 
responding.  This  algorithm  could  incorporate  perfonnance  across  BEAM  variables  into 
Slick  et  al.-like  (244-246)  criteria  for  possible,  probable,  and  definite  invalid  responding. 

Additional  studies  with  clinical  populations  are  needed  to  assess  the  BEAM’S 
“real-world”  classification  accuracy.  While  this  study  showed  promise  that  the  BEAM 
could  minimize  false  positives  in  a  clinical  sample,  additional  studies  of  valid  and  invalid 
responders  with  head  injuries  are  needed.  It  is  currently  unknown  how  invalid  responders 
with  a  history  of  mild  TBI  would  perform  on  the  BEAM.  Other  studies  of  persons  with 
clinical  conditions  of  interest  should  compare  BEAM  perfonnance  with  embedded  and 
freestanding  RVTs  in  order  to  cross-validate  incremental  predictive  value  of  the  BEAM 
above-and-beyond  existing  measures. 

Conclusion 

The  results  of  this  combined  study  strongly  suggest  that  eye  movements  can  be 
used  to  detect  invalid  responding  at  or  above  the  capabilities  of  several  existing  response 
validity  tests.  The  Bethesda  Eye  &  Attention  Measure  (BEAM),  a  novel  eye-tracking 
tool,  appears  uniquely  capable  of  serving  as  a  neurocognitive  assessment  tool  with 
multiple,  unique  embedded  validity  indices.  The  BEAM  performed  favorably  when 
compared  to  well-validated  embedded  and  freestanding  RVTs — including  the  CPT-II, 
WAIS-IV  Digit  Span,  Trail  Making  Test  A  &  B,  and  the  MSVT. 

This  project’s  findings  support  the  general  trend  of  using  continuous  performance 
tests  and  their  associated  metrics  (e.g.,  RT,  RT  variability,  omissions,  commissions)  to 
detect  invalid  responding  (42;  1 13;  118;  149;  169;  198;  211;  253;  264;  282).  The  study 
adds  to  the  extant  response  validity  literature  by  demonstrating  that  saccadic  performance 
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in  a  continuous  perfonnance  test  may  be  used  to  detect  invalid  responding.  This  project 
also  provides  evidence  that  concurrent,  dual-modality  tests  such  as  the  BEAM  (i.e., 
saccadic  and  manual  responses)  may  provide  enhanced  capabilities  of  detecting  invalid 
responding  above  and  beyond  single-modality  tests  (i.e.,  manual-only  responses)  in 
clinical  and  non-clinical  samples. 

The  generalizability  of  the  simulator  study  results  were  greatly  enhanced  with  the 
addition  of  a  clinical  comparison  group  of  valid  responders  with  a  history  of  mild  TBI. 

In  addition  to  its  strong  perfonnance  in  tightly  controlled  experimental  conditions,  the 
BEAM  demonstrated  preliminary  evidence  supporting  its  clinical  utility.  Results  from 
the  clinical  group  analyses  suggest  that  the  BEAM  could  potentially  identify  invalid 
responding  in  larger,  more  diverse  mild  TBI  populations  while  minimizing  false 
positives.  Additional  research  should  evaluate  the  BEAM’S  ability  to  identify  invalid 
responding  among  other  neurological  conditions  such  as  ADHD,  HIV-associated 
neurocognitive  disorder  (HAND),  and  moderate-to-severe  TBI.  While  future  studies  are 
needed  to  cross-validate  this  project’s  findings  in  larger,  more  heterogeneous  groups  of 
persons  with  and  without  neurological  conditions,  this  project’s  collective  findings 
provide  strong  initial  evidence  that  the  BEAM  can  be  used  to  detect  invalid  responding  in 
neurocognitive  assessment. 


138 


APPENDIX  A:  TABLES 


Table  1 :  Contingency  table  for  response  validity  test  outcomes  in  a  TBI  population 


Actual  Diagnostic  Condition 

Test  Result 

Invalid  Performance- Yes 

TBI-No 

Invalid  Performance- Yes 

TP 

(true  positive) 

FP 

(false  positive  =  a  =Type  I  error) 

Invalid  Performance-No 

FN 

(false  negative  =  P  =Type  II  error) 

TN 

(true  negative) 

Table  2:  Embedded  response  validity  test  cutoff  scores  used  for  UR-mTBI  group 

Measure  Metric  Cutoff 


WAIS-IV  Digit  Span 

Reliable  Digit  Span  (RDS) 

<7 

WAIS-IV  Digit  Span 

Age  Corrected  Scaled  Score  (ACSS) 

<7 

CPT-II 

Omissions 

>  12  raw 

CPT-II 

Commissions 

>  22  raw 

CPT-II 

Hit  RT  SE 

>  14  raw 

CPT-II 

Perseverations 

>  2  raw 

TMT  A 

Completion  Time 

>  63  sec 

TMTB 

Completion  Time 

>  200  sec 
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Table  3:Demographic  characteristics  of  simulator  study  groups 


Biased  Responders 
(n=24) 

Unbiased  Responders 
(n=26) 

P 

Mean  age 
in  years  (SD) 

28.6  (8.9) 

28.4(10.5) 

.92 

Mean  years  of 
education  (SD) 

16.9(2.0) 

16.7(1.7) 

.67 

Estimated  premorbid 
intelligence  (SD) 

117(9.3) 

116(7.8) 

.78 

Gender 

.54 

Male 

9  (37.5) 

12  (46.2) 

Female 

15  (62.5) 

14(53.8) 

Race/ethnicity 

.13 

Caucasian 

16  (66.7) 

21  (80.8) 

African-American 

1  (4.2) 

4(15.4) 

Hispanic 

1  (4.2) 

0  (0.0) 

Asian 

5  (20.8) 

1  (3.8) 

Other 

1  (4.2) 

0  (0.0) 

Knowledge  of  head 
injury  sequelae  (SD) 

9.48  (2.63) 

9.58  (2.23) 

.89 

Note,  p  values  reflect  results  of  t-tests  or  chi-square  analyses 
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Table  4:  ROC  analyses  for  BEAM  variables  in  simulator  study  (BR  vs.  UR  groups) 


Variable 

Positive 

(BR) 

Negative 

(UR) 

AUC 

SE 

95% 

Low 

95% 

Hi 

P 

Saccadic  RT-IIV-Overall 

24 

26 

0.97 

0.02 

0.93 

1.00 

<.001 

Manual  RT-IIV-Overall 

23 

26 

0.97 

0.02 

0.92 

1.00 

<.001 

Manual  RT-IIV-DC 

23 

26 

0.97 

0.02 

0.92 

1.00 

<.001 

Manual  RT-IIV-NDC 

23 

26 

0.96 

0.03 

0.90 

1.00 

<.001 

Manual  RT-IIV-MDC 

18 

26 

0.96 

0.03 

0.90 

1.00 

<.001 

Manual  Omission  Error  % 

24 

26 

0.94 

0.04 

0.87 

1.00 

<.001 

Saccadic  Commission  Error  % 

24 

26 

0.94 

0.03 

0.87 

1.00 

<.001 

Saccadic  RT-IIV-DC 

24 

26 

0.93 

0.04 

0.86 

1.00 

<.001 

Manual  RT-IIV-UCG 

21 

26 

0.93 

0.05 

0.84 

1.00 

<.001 

Saccadic  RT-IIV-MDC 

24 

26 

0.92 

0.05 

0.83 

1.00 

<.001 

Manual  RT-IIV-UC 

23 

26 

0.92 

0.04 

0.84 

0.99 

<.001 

Saccadic  RT-IIV-UCG 

23 

26 

0.90 

0.05 

0.80 

1.00 

<.001 

Manual  RT-Overall 

23 

26 

0.89 

0.05 

0.80 

0.99 

<.001 

Manual  RT-UCG 

21 

26 

0.89 

0.05 

0.78 

0.99 

<.001 

Manual  RT-NDC 

23 

26 

0.88 

0.05 

0.78 

0.98 

<.001 

Manual  RT-MDC 

18 

26 

0.86 

0.07 

0.74 

0.99 

<.001 

Saccadic  RT-IIV-UC 

24 

26 

0.85 

0.05 

0.75 

0.96 

<.001 

Saccadic  RT-IIV-NDC 

24 

26 

0.85 

0.06 

0.74 

0.96 

<.001 

Manual  RT-DC 

23 

26 

0.84 

0.06 

0.73 

0.96 

<.001 

Manual  RT-UC 

23 

26 

0.83 

0.06 

0.70 

0.95 

<.001 

Saccadic  RT-MDC 

24 

26 

0.81 

0.06 

0.69 

0.93 

<.001 

Manual  Commission  Error  % 

24 

26 

0.80 

0.07 

0.66 

0.94 

<.001 

Saccadic  RT-DC 

24 

26 

0.77 

0.07 

0.65 

0.90 

.001 

Saccadic  RT-Overall 

24 

26 

0.74 

0.07 

0.60 

0.88 

.004 

Saccadic  RT-NDC 

24 

26 

0.71 

0.07 

0.57 

0.86 

.01 

#  Invalid  Initial  Fixations 

24 

26 

0.69 

0.08 

0.53 

0.84 

.02 

Saccadic  RT-UCG 

23 

26 

0.61 

0.09 

0.44 

0.78 

.19 

Saccadic  RT-UC 

24 

26 

0.60 

0.09 

0.42 

0.77 

.25 

Saccadic  Omission  Error  % 

24 

26 

0.60 

0.08 

0.44 

0.75 

.25 

Note.  UR  =  unbiased  responders;  BR  =  biased  responders;  RT  =  reaction  time;  IIV  =  intra-individual  variability;  DC  =  directional  cue; 
NDC  =  nondirectional  cue;  MDC  =  misdirectional  cue;  UCG  =  uncued  with  gap;  UC  =  uncued;  DCR  =  directional  cue-red  arrow. 
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Table  5:  Non-normally  distributed  BEAM  variables  with  outstanding  classification 
_ accuracy _ 


Measures 

Unbiased  responders 

Biased  responders 

P 

r 

n 

Mdn 

IQR 

n 

Mdn 

IQR 

Saccadic  Commission  Error  % 

26 

7.29 

2.42  -  17.6 

24 

65.5 

45.8  -  79.9 

<.001 

.75 

Manual  Omission  Error  % 

26 

0.00 

0.00  -  0.63 

24 

9.65 

3.91  -26.11 

<.001 

.78 

Manual  RT-IIV-UCG  (sec) 

26 

0.072 

0.059  -  0.086 

21 

0.128 

0.112-0.128 

<.001 

.72 

Manual  RT-IIV-UC  (sec) 

26 

0.085 

0.075  -  0.095 

23 

0.130 

0.103-0.150 

<.001 

.71 

Note.  Mann- Whitney  U  effect  size  “r”  was  calculated  by  dividing  Z-score  by  the  square  root  of  N.  RT  =  reaction  time;  IIV  =  intra¬ 
individual  variability;  UCG  =  uncued  with  gap;  UC  =  uncued. 


Table  6:  Normally  distributed  BEAM  variables  with  outstanding  classification  accuracy 


Measures 

Unbiased  responders 

n  M  SD 

Biased  responders 

n  M  SD 

P 

Cohen’s  d 

Saccadic  RT-IIV-DC  (sec) 

26 

0.078 

0.024 

24 

0.139 

0.031 

<.001 

2.20 

Saccadic  RT-IIV-MDC  (sec) 

26 

0.080 

0.023 

24 

0.160 

0.050 

<.001 

2.06 

Saccadic  RT-IIV-UCG  (sec) 

26 

0.076 

0.021 

23 

0.137 

0.038 

<.001 

1.99 

Saccadic  RT-IIV-Overall  (sec) 

26 

0.082 

0.018 

24 

0.142 

0.026 

<.001 

2.68 

Manual  RT-IIV-DC  (sec) 

26 

0.076 

0.021 

23 

0.151 

0.030 

<.001 

2.90 

Manual  RT-IIV-NDC  (sec) 

26 

0.065 

0.022 

23 

0.128 

0.027 

<.001 

2.56 

Manual  RT-IIV-MDC  (sec) 

26 

0.077 

0.025 

18 

0.144 

0.034 

<.001 

2.25 

Manual  RT-IIV -Overall  (sec) 

26 

0.076 

0.018 

23 

0.134 

0.022 

<.001 

2.89 

Note.  RT  =  reaction  time;  IIV  =  intra-individual  variability;  DC  =  directional  cue;  NDC  =  nondirectional  cue;  MDC  =  misdirectional 
cue;  UCG  =  uncued  with  gap;  UC  =  uncued. 
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Table  7:  Cumulative  percentages  of  persons  with  scores  above  the  indicated  cutoff  on 
selected  BEAM  variables 


Variable _ 

SacRT-IIV-DC  (sec) 


SacRT -II V -MDC  (sec) 


SacRT-IIV-UCG  (sec) 


SacRT-IIV -Overall  (sec) 


ManRT-IIV-DC  (sec) 


Hit 


Cutoff  %  BR“ 

%  URh 

Rate 

LR+ 

LR- 

PPV 

NPV 

>0.142 

50 

100 

68 

0.137 

54 

100 

70 

0.131 

58 

100 

72 

0.130 

63 

100 

74 

0.129 

67 

100 

76 

0.128 

71 

100 

79 

0.127 

71 

4 

84 

18.4 

7.6 

94 

78 

0.126 

75 

4 

86 

19.5 

6.5 

95 

81 

0.125 

79 

4 

88 

20.6 

5.4 

95 

83 

0.119 

79 

8 

86 

10.3 

2.7 

90 

83 

0.113 

79 

12 

84 

6.9 

1.8 

86 

82 

0.109 

79 

15 

82 

5.2 

1.4 

83 

81 

0.104 

83 

15 

84 

5.4 

1.1 

83 

85 

>0.144 

63 

100 

74 

0.142 

67 

100 

76 

0.138 

71 

100 

79 

0.132 

75 

100 

81 

0.129 

79 

100 

84 

0.126 

83 

100 

87 

0.123 

83 

4 

90 

21.7 

4.3 

95 

86 

0.122 

88 

4 

92 

22.8 

3.3 

95 

89 

0.119 

88 

8 

90 

11.4 

1.6 

91 

89 

0.115 

88 

12 

88 

7.6 

1.1 

88 

88 

0.112 

88 

15 

86 

5.7 

0.8 

84 

88 

>0.137 

61 

100 

74 

0.136 

65 

100 

76 

0.131 

70 

100 

79 

0.125 

74 

100 

81 

0.122 

78 

100 

84 

0.117 

78 

4 

86 

20.4 

5.7 

95 

83 

0.109 

78 

8 

84 

10.2 

2.8 

90 

83 

0.102 

78 

12 

82 

6.8 

1.9 

86 

82 

0.099 

78 

15 

80 

5.1 

1.4 

82 

81 

0.097 

83 

15 

82 

5.4 

1.1 

83 

85 

0.095 

87 

15 

84 

5.7 

0.9 

83 

88 

>0.133 

67 

100 

76 

0.131 

71 

100 

79 

0.126 

75 

100 

81 

0.121 

79 

100 

84 

0.118 

79 

4 

88 

20.6 

5.4 

95 

83 

0.117 

83 

4 

90 

21.7 

4.3 

95 

86 

0.116 

83 

4 

88 

10.8 

2.2 

91 

86 

0.115 

88 

8 

90 

11.4 

1.6 

91 

89 

0.112 

88 

12 

88 

7.6 

1.1 

88 

88 

0.108 

88 

15 

86 

5.7 

0.8 

84 

88 

0.106 

92 

15 

88 

6.0 

0.5 

85 

92 

0.102 

96 

15 

90 

6.2 

0.3 

85 

96 

>0.147 

61 

100 

74 

0.145 

65 

100 

76 

0.142 

70 

100 

79 

0.137 

74 

100 

81 

0.130 

78 

100 

84 

0.127 

83 

100 

87 

0.123 

83 

4 

88 

21.5 

4.5 

95 

86 

0.120 

83 

8 

86 

10.7 

2.3 

90 

86 

0.117 

87 

8 

88 

11.3 

1.7 

91 

89 

0.105 

91 

8 

90 

11.9 

1.1 

91 

92 
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Hit 


Variable 

Cutoff 

%BR“ 

%  URh 

Rate 

LR+ 

LR- 

PPV 

NPV 

0.094 

91 

12 

88 

7.9 

0.8 

88 

92 

0.093 

91 

15 

86 

5.9 

0.6 

84 

92 

ManRT-IIV -NDC  (sec) 

>0.138 

43 

100 

67 

0.134 

48 

100 

68 

0.129 

52 

100 

70 

0.126 

57 

100 

72 

0.119 

61 

100 

74 

0.113 

65 

100 

76 

0.111 

65 

4 

80 

17.0 

9.0 

94 

76 

0.110 

70 

4 

82 

18.1 

7.9 

94 

78 

0.109 

74 

4 

84 

19.2 

6.8 

94 

81 

0.108 

78 

4 

86 

20.4 

5.7 

95 

83 

0.106 

83 

4 

88 

21.5 

4.5 

95 

86 

0.104 

87 

4 

90 

22.6 

3.4 

95 

89 

0.101 

91 

4 

92 

23.7 

2.3 

95 

93 

0.099 

91 

8 

90 

11.9 

1.1 

91 

92 

0.094 

91 

15 

86 

5.9 

0.6 

84 

92 

0.092 

96 

15 

88 

6.2 

0.3 

85 

96 

ManRT -II V -MDC  (sec) 

>0.141 

44 

100 

72 

0.136 

50 

100 

74 

0.132 

56 

100 

76 

0.127 

61 

100 

79 

0.125 

67 

100 

81 

0.123 

72 

100 

84 

0.120 

72 

4 

76 

18.8 

7.2 

93 

83 

0.118 

72 

8 

74 

9.4 

3.6 

87 

83 

0.116 

78 

8 

76 

10.1 

2.9 

88 

86 

0.114 

78 

12 

74 

6.7 

1.9 

82 

85 

0.111 

83 

15 

74 

5.4 

1.1 

79 

88 

ManRT-IIV -UCG  (sec) 

>0.140 

19 

100 

60 

0.138 

24 

100 

62 

0.135 

29 

100 

63 

0.132 

33 

100 

65 

0.131 

38 

100 

67 

0.130 

43 

100 

68 

0.129 

43 

4 

68 

11.1 

14.9 

90 

68 

0.128 

48 

4 

70 

12.4 

13.6 

91 

69 

0.127 

52 

4 

72 

13.6 

12.4 

92 

71 

0.126 

57 

4 

74 

14.9 

11.1 

92 

74 

0.125 

62 

4 

76 

16.1 

9.9 

93 

76 

0.123 

67 

4 

78 

17.3 

8.7 

93 

78 

0.118 

71 

4 

80 

18.6 

7.4 

94 

81 

0.115 

76 

4 

82 

19.8 

6.2 

94 

83 

0.112 

76 

8 

80 

9.9 

3.1 

89 

83 

0.107 

81 

8 

82 

10.5 

2.5 

89 

86 

0.103 

86 

8 

84 

11.1 

1.9 

90 

89 

0.100 

86 

12 

82 

7.4 

1.2 

86 

88 

0.097 

86 

15 

80 

5.6 

0.9 

82 

88 

0.096 

90 

15 

82 

5.9 

0.6 

83 

92 

0.095 

95 

15 

84 

6.2 

0.3 

83 

96 

ManRT-IIV-UC  (sec) 

>0.154 

4 

100 

54 

0.153 

9 

100 

55 

0.151 

13 

100 

57 

0.151 

17 

100 

58 

0.150 

22 

100 

59 

0.146 

26 

100 

60 

0.140 

26 

4 

62 

6.8 

19.2 

86 

60 

0.136 

30 

4 

64 

7.9 

18.1 

88 

61 

0.134 

35 

4 

66 

9.0 

17.0 

89 

63 

0.132 

39 

4 

68 

10.2 

15.8 

90 

64 
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Variable 

Cutoff 

%o  BR“ 

%  URh 

Hit 

Rate 

LR+ 

LR- 

PPV 

NPV 

0.131 

43 

4 

70 

11.3 

14.7 

91 

66 

0.130 

48 

4 

72 

12.4 

13.6 

92 

68 

0.126 

52 

4 

74 

13.6 

12.4 

92 

69 

0.122 

57 

4 

76 

14.7 

11.3 

93 

71 

0.119 

61 

4 

78 

15.8 

10.2 

93 

74 

0.117 

61 

8 

76 

7.9 

5.1 

88 

73 

0.111 

65 

8 

78 

8.5 

4.5 

88 

75 

0.105 

65 

12 

76 

5.7 

3.0 

83 

74 

0.104 

74 

15 

78 

4.8 

1.7 

81 

79 

ManRT-IIV-Overall  (sec) 

>0.136 

57 

100 

72 

0.135 

61 

100 

74 

0.132 

65 

100 

76 

0.128 

70 

100 

79 

0.127 

74 

100 

81 

0.122 

78 

100 

84 

0.117 

78 

4 

86 

20.4 

5.7 

95 

83 

0.117 

78 

8 

84 

10.2 

2.8 

90 

83 

0.111 

83 

8 

86 

10.7 

2.3 

90 

86 

0.106 

87 

8 

88 

11.3 

1.7 

91 

89 

0.099 

91 

8 

90 

11.9 

1.1 

91 

92 

0.094 

96 

8 

92 

12.4 

0.6 

92 

96 

0.093 

96 

12 

90 

8.3 

0.4 

88 

96 

0.092 

96 

15 

88 

6.2 

0.3 

85 

96 

Saccadic  Commissions 

>57.9 

58 

100 

72 

(%  of  DCR  trials) 

55.4 

63 

100 

74 

53.3 

67 

100 

76 

52.6 

71 

100 

79 

47.9 

75 

100 

81 

43.5 

79 

100 

84 

43.1 

83 

4 

90 

21.7 

4.3 

95 

86 

41.8 

83 

8 

88 

10.8 

2.2 

91 

86 

32.9 

83 

12 

86 

7.2 

1.4 

87 

85 

24.0 

83 

15 

84 

5.4 

1.1 

83 

85 

Manual  Omissions 

>25.5 

25 

100 

59 

(%  of  non-DCR  trials) 

20.8 

29 

100 

60 

17.4 

29 

4 

64 

7.6 

18.4 

88 

60 

17.3 

33 

4 

66 

8.7 

17.3 

89 

61 

17.1 

38 

4 

68 

9.8 

16.3 

90 

63 

14.1 

42 

4 

70 

10.8 

15.2 

91 

64 

10.7 

46 

4 

72 

11.9 

14.1 

92 

66 

9.7 

50 

4 

74 

13.0 

13.0 

92 

68 

8.2 

54 

4 

76 

14.1 

11.9 

93 

69 

7.1 

58 

4 

78 

15.2 

10.8 

93 

71 

5.9 

67 

4 

82 

17.3 

8.7 

94 

76 

5.2 

71 

4 

84 

18.4 

7.6 

94 

78 

4.3 

75 

4 

86 

19.5 

6.5 

95 

81 

3.3 

79 

4 

88 

20.6 

5.4 

95 

83 

3.0 

83 

4 

90 

21.7 

4.3 

95 

86 

2.7 

88 

4 

92 

22.8 

3.3 

95 

89 

2.4 

92 

4 

94 

23.8 

2.2 

96 

93 

1.7 

92 

8 

92 

11.9 

1.1 

92 

92 

1.0 

92 

12 

90 

7.9 

0.7 

88 

92 

Note.  BR  =  biased  responders;  UR  =  unbiased  responders;  LR  =  likelihood  ratio;  Sac  =  saccadic;  Man  =  manual;  RT  =  reaction  time; 
IIV  =  intra-individual  variability;  DC  =  directional  cue;  NDC  =  nondirectional  cue;  MDC  =  misdirectional  cue;  UCG  =  uncued  with 
gap;  UC  =  uncued;  DCR  =  directional  cue-red  arrow. 


“n=24  for  all  BR  variables  except  Saccadic  RT-IIV-UCG  (n=23),  Manual  RT-IIV-DC  (n=23).  Manual  RT-IIV-NDC  (n=23).  Manual 
RT-IIV-MDC  (n=18),  Manual  RT-IIV-UCG  (n=21),  Manual  RT-IIV-UC  (n=23),  and  Manual  RT-IIV-Overall  (n=23). 
bn=26  for  all  UR  variables. 
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Table  8:  ROC  analyses  for  embedded  RYT  variables  in  the  simulator  study 


Variable 

Positive 

(BR) 

Negative 

(UR) 

AUC 

SE 

95% 

Low 

95% 

Hi 

P 

WAIS-IV  Digit  Span  RDS-R 

24 

26 

0.94 

0.03 

0.88 

1.00 

<.001 

WAIS-IV  Digit  Span  ACSS 

24 

26 

0.94 

0.03 

0.88 

1.00 

<.001 

WAIS-IV  Digit  Span  ARDS 

24 

26 

0.94 

0.04 

0.86 

1.00 

<.001 

WAIS-IV  Digit  Span  RDS 

24 

26 

0.93 

0.04 

0.84 

1.00 

<.001 

CPT-II  Commissions  (raw) 

24 

26 

0.93 

0.04 

0.85 

1.00 

<.001 

CPT-II  Omissions  (raw) 

24 

26 

0.91 

0.05 

0.82 

1.00 

<.001 

CPT-II  Hit  RT  SE  (raw) 

24 

26 

0.90 

0.04 

0.82 

0.99 

<.001 

CPT-II  Variability  (raw) 

24 

26 

0.89 

0.05 

0.80 

0.98 

<.001 

CPT-II  Detectability  (raw) 

24 

26 

0.88 

0.05 

0.77 

0.98 

<.001 

CPT-II  Perseverations  (raw) 

24 

26 

0.84 

0.06 

0.73 

0.96 

<.001 

CPT-II  RT  ISI  Change  (raw) 

24 

26 

0.79 

0.06 

0.67 

0.91 

<.001 

Trail  Making  Test  B  Time  (sec) 

24 

26 

0.79 

0.06 

0.66 

0.91 

.001 

Trail  Making  Test  A  Time  (sec) 

24 

26 

0.74 

0.08 

0.59 

0.90 

.003 

CPT-II  SE  ISI  Change  (raw) 

24 

26 

0.67 

0.08 

0.52 

0.82 

.04 

CPT-II  Response  Style  (raw) 

24 

26 

0.67 

0.08 

0.51 

0.82 

.05 

CPT-II  Hit  RT  SE 

Block  Change  (raw) 

24 

26 

0.54 

0.08 

0.37 

0.70 

.67 

CPT-II  Hit  RT  (raw) 

24 

26 

0.52 

0.09 

0.35 

0.69 

.83 

CPT-II  Hit  RT 

Block  Change  (raw) 

24 

26 

A  th  t-  1  • .  •  r»  r 

0.50 

0.09 

0.33 

0.66 

.96 

Note.  WAIS-IV  =  Wechsler  Adult  Intelligence  Scale-411’  Edition;  RDS-R  =  Reliable  Digit  Span-Revised;  ACSS  =  Age-Corrected 
Scaled  Score;  ARDS  =  Alternative  Reliable  Digit  Span;  RDS  =  Reliable  Digit  Span;  CPT-II  =  Conners’  Continuous  Performance 
Test-2nd  Edition;  RT  =  reaction  time;  SE  =  standard  error;  ISI  =  interstimulus  interval. 


Table  9:  Non-normally  distributed  embedded  RVT  variables  with  outstanding 
classification  accuracy 


Unbiased  responders  Biased  responders 

(n  =  26) _ (n  =  24) 


Measures 

Mdn 

IQR 

Mdn 

IQR 

P 

r 

WAIS-IV  Digit  Span 

RDS-R 

17.0 

16.0-  18.0 

11.0 

10.0  -  13.8 

<.001 

.77 

ARDS 

12.0 

11.8  -  13.0 

8.50 

7.00  -  10.0 

<.001 

.76 

CPT-II 

Omissions  (raw) 

0.00 

0.00-  1.00 

6.50 

2.00  -  11.0 

<.001 

.72 

Hit  RT  SE  (raw) 

4.20 

3.30  -  4.82 

7.56 

5.33-9.91 

<.001 

.69 

Note.  Mann- Whitney  U  effect  size  “r”  was  calculated  by  dividing  Z-score  by  the  square  root  of  N.  WAIS-IV  =  Wechsler  Adult 
Intelligence  Scale-4th  Edition;  RDS-R  =  Reliable  Digit  Span-Revised;  ARDS  =  Alternative  Reliable  Digit  Span;  RT  =  reaction  time; 
SE  =  standard  error. 


Table  10:  Normally  distributed  embedded  RVT  variables  with  outstanding  classification 
accuracy 


Measures 

Unbiased  responders 
(n  =  26) 

Biased  responders 
(n  =  24) 

Cohen’s  d 

M 

SD 

M 

SD 

P 

WAIS-IV  Digit  Span 

ACSS 

12.2 

2.43 

6.63 

2.57 

<.001 

2.23 

RDS 

11.1 

1.80 

7.29 

2.12 

<.001 

1.94 

Conners  ’  CPT-II 

Commissions  (raw) 

10.4 

4.00 

n  _  _i  _  tth  r;  Jii.:... 

20.6 

5.34 

<.001 

2.16 
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Table  1 1 :  Cumulative  percentages  of  persons  with  scores  above  or  below  the  indicated 
cutoff  on  selected  embedded  RVT  variables 

Test  Variable 

% BR  %  UR 
Cutoff  (n=24)  ( n=26 ) 

Hit 

Rate 

LR+ 

LR- 

PPV 

NPV 

WAIS-IV:  ACSS 

<4 

29 

100 

60 

Digit  Span 

5 

33 

100 

62 

6 

46 

4 

72 

11.9 

14.1 

92 

66 

7 

67 

4 

82 

17.3 

8.7 

94 

76 

8 

71 

4 

84 

18.4 

7.6 

94 

78 

9 

92 

12 

90 

7.9 

0.7 

88 

92 

10 

92 

15 

88 

6.0 

0.5 

85 

92 

RDS 

<5 

25 

100 

59 

6 

42 

100 

65 

7 

54 

4 

76 

14.1 

11.9 

93 

69 

8 

67 

4 

82 

17.3 

8.7 

94 

76 

9 

92 

12 

90 

7.9 

0.7 

88 

92 

RDS-R 

<9 

21 

100 

58 

10 

33 

100 

62 

11 

54 

100 

70 

12 

63 

4 

80 

16.3 

9.8 

94 

74 

13 

75 

4 

86 

19.5 

6.5 

95 

81 

14 

88 

8 

90 

11.4 

1.6 

91 

89 

15 

92 

12 

90 

7.9 

0.7 

88 

92 

ARDS 

<7 

33 

100 

62 

8 

50 

100 

68 

9 

67 

100 

76 

10 

88 

8 

90 

11.4 

1.6 

91 

89 

CPT-II  Commissions  (raw) 

>24 

29 

100 

60 

22 

42 

100 

65 

21 

67 

100 

76 

20 

71 

100 

79 

19 

75 

4 

86 

19.5 

6.5 

95 

81 

18 

79 

4 

88 

20.6 

5.4 

95 

83 

17 

79 

12 

84 

6.9 

1.8 

86 

82 

16 

83 

12 

86 

7.2 

1.4 

87 

85 

15 

83 

15 

84 

5.4 

1.1 

83 

85 

Omissions  (raw) 

>9 

42 

100 

65 

7 

50 

100 

68 

6 

63 

100 

74 

5 

67 

100 

76 

4 

67 

4 

82 

17.3 

8.7 

94 

76 

3 

71 

15 

78 

4.6 

1.9 

81 

76 

2 

92 

15 

88 

6.0 

0.5 

85 

92 

Hit  RT  SE  (sec) 

>9.2 

38 

100 

63 

00 

bo 

42 

100 

65 

8.4 

46 

100 

67 

7.6 

50 

100 

68 

6.7 

50 

4 

74 

13.2 

13.2 

92 

68 

5.8 

71 

4 

84 

18.6 

7.7 

95 

78 

5.6 

71 

8 

82 

9.2 

3.8 

89 

77 

5.4 

75 

8 

84 

9.7 

3.2 

90 

80 

5.3 

75 

12 

82 

6.5 

2.2 

86 

79 

5.2 

79 

12 

84 

6.9 

1.8 

86 

82 

5.1 

79 

15 

82 

5.1 

1.4 

83 

82 

5.0 

88 

15 

86 

5.7 

0.8 

84 

88 

Note.  BR  =  biased  responders;  UR  =  unbiased  responders;  LR  =  likelihood  ratio;  WAIS-IV  =  Wechsler  Adult  Intelligence  Scale-4' 
Edition;  ACSS  =  age-corrected  scaled  score;  RDS  =  reliable  digit  span;  RDS-R  =  reliable  digit  span-revised;  ARDS  =  alternative 
reliable  digit  span;  CPT-II  =  Conners’  Continuous  Performance  Test-II;  RT  =  reaction  time;  SE  =  standard  error. 
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Table  12:  ROC  analyses  for  freestanding  RYT  variables 


Variable 

Positive 

(BR) 

Negative 

(UR) 

AUC 

SE 

95% 

Low 

95% 

Hi 

P 

VSVT  Difficult  Correct 

24 

26 

1.00 

0.00 

1.00 

1.00 

<.001 

VSVT  Total  Correct 

24 

26 

1.00 

0.00 

1.00 

1.00 

<.001 

MSVT FR 

24 

26 

0.96 

0.03 

0.91 

1.00 

<.001 

MSVT  CNS 

24 

26 

0.94 

0.04 

0.87 

1.00 

<.001 

MSVT  PA 

24 

26 

0.93 

0.04 

0.85 

1.00 

<.001 

VSVT  Total  RT  (sec) 

24 

26 

0.93 

0.04 

0.86 

1.00 

<.001 

MSVT  DR 

24 

26 

0.92 

0.04 

0.84 

1.00 

<.001 

VSVT  Difficult  RT  (sec) 

24 

26 

0.91 

0.05 

0.82 

0.99 

<.001 

VSVT  Easy  RT  (sec) 

24 

26 

0.90 

0.04 

0.81 

0.98 

<.001 

VSVT  Difficult  RT  SD  (sec) 

24 

26 

0.90 

0.05 

0.81 

0.99 

<.001 

MSVT  IR 

24 

26 

0.90 

0.05 

0.80 

1.00 

<.001 

MSVT  Fail  Any  Subtest 
(per  nonns) 

24 

26 

0.90 

0.05 

0.80 

1.00 

<.001 

VSVT  Total  RT  SD  (sec) 

24 

26 

0.87 

0.05 

0.77 

0.98 

<.001 

VSVT  Easy  RT  SD  (sec) 

24 

26 

0.85 

0.06 

0.74 

0.96 

<.001 

VSVT  Easy  Correct 

24 

26 

0.85 

0.06 

0.73 

0.96 

<.001 

Note:  VSVT  =  Victoria  Symptom  Validity  Test;  MSVT  =  Medical  Symptom  Validity  Test;  FR  =  Free  Recall  %  Correct;  CNS  = 
Consistency  %  Correct;  PA  =  Paired  Associates  %  Correct;  RT  =  response  latency;  DR  =  Delayed  Recall  %  Correct;  SD  =  standard 
deviation;  IR  =  Immediate  Recall  %  Correct. 

Table  13:  Non-normally  distributed  freestanding  RVT  variables  with  outstanding 
classification  accuracy 

Unbiased  responders 
(n  =  26) 

Biased  responders 
(n  =  24) 

Measures 

Mdn 

1QR 

Mdn 

IQR 

P 

r 

VSVT 

Difficult  Correct 

24.0 

22.8  -24.0 

12.5 

10.0  -  13.0 

<.001 

.87 

Total  Correct 

48.0 

46.8  -48.0 

32.5 

27.8  -  36.8 

<.001 

.87 

Easy  RT  (sec) 

1.02 

0.88  -  1.10 

1.93 

1.31-2.96 

<.001 

.68 

Difficult  RT  (sec) 

1.72 

1.43  -  1.90 

3.45 

2.26-5.36 

<.001 

.69 

Difficult  RT  SD  (sec) 

0.49 

0.38-0.64 

1.14 

0.76-2.72 

<.001 

.68 

Total  RT  (sec) 

1.38 

1.14-  1.51 

2.91 

1.68-4.38 

<.001 

.73 

MSVT 

IR 

100 

100-100 

85.0 

75.0-95.0 

<.001 

.78 

DR 

100 

100-  100 

75.0 

56.3-87.8 

<.001 

.78 

CNS 

100 

100-  100 

65.0 

60.0  -  93.8 

<.001 

.81 

PA 

100 

100-  100 

70.0 

60.0  -  87.5 

<.001 

.82 

FR 

90.0 

80.0  -  90.0 

55.0 

41.3-63.8 

<.001 

.79 

Fail  Any  Subtest 

0.00 

0.00  -  0.00 

1.00 

1.30-4.00 

<.001 

.81a 

Note:  VSVT  =  Victoria  Symptom  Validity  Test;  RT  = 

response  latency;  SD 

=  standard  deviation;  IR;  MSVT  = 

=  Medical  Symptom 

Validity  Test;  FR  =  Free  Recall  %  Correct;  CNS  =  Consistency  %  Correct;  PA  =  Paired  Associates  %  Correct;  DR  =  Delayed  Recall 
%  Correct;  =  Immediate  Recall  %  Correct. 
aPhi  value 
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Table  14:  Cumulative  percentages  of  persons  with  scores  above  or  below  the  indicated 
cutoff  on  selected  freestanding  RVT  variables 


Test 

Variable 

Cutoff 

%BR 

(n=24) 

%  UR 

( n=26 ) 

Hit 

Rate 

LR+ 

LR- 

PPV 

NPV 

VSVT 

Easy  RT  (sec) 

>1.99 

50 

0 

76 

100 

68 

1.93 

50 

4 

74 

13.0 

13.0 

92 

68 

1.87 

50 

8 

72 

6.5 

6.5 

86 

67 

1.78 

54 

8 

74 

7.0 

6.0 

87 

69 

1.70 

58 

8 

76 

7.6 

5.4 

88 

71 

1.59 

63 

8 

78 

8.1 

4.9 

88 

73 

1.51 

63 

12 

76 

5.4 

3.3 

83 

72 

1.49 

67 

12 

78 

5.8 

2.9 

84 

74 

1.40 

71 

12 

80 

6.1 

2.5 

85 

77 

1.31 

75 

12 

82 

6.5 

2.2 

86 

79 

1.25 

79 

12 

84 

6.9 

1.8 

86 

82 

1.17 

83 

12 

86 

7.2 

1.4 

87 

85 

1.14 

88 

12 

88 

7.6 

1.1 

88 

88 

Difficult  RT  (sec) 

>3.28 

54 

0 

78 

100 

70 

3.09 

54 

4 

76 

14.1 

11.9 

93 

69 

2.99 

58 

4 

78 

15.2 

10.8 

93 

71 

2.78 

63 

4 

80 

16.3 

9.8 

94 

74 

2.53 

67 

4 

82 

17.3 

8.7 

94 

76 

2.36 

67 

8 

80 

8.7 

4.3 

89 

75 

2.29 

75 

8 

84 

9.8 

3.3 

90 

80 

2.26 

75 

12 

82 

6.5 

2.2 

86 

79 

2.15 

79 

12 

84 

6.9 

1.8 

86 

82 

2.02 

83 

12 

86 

7.2 

1.4 

87 

85 

Difficult  RT  SD  (sec) 

>1.57 

46 

74 

100 

67 

1.43 

46 

4 

72 

12.1 

14.3 

92 

66 

1.15 

50 

4 

74 

13.2 

13.2 

92 

68 

0.94 

50 

8 

72 

6.5 

6.5 

86 

67 

0.93 

54 

8 

74 

7.0 

5.9 

87 

69 

0.91 

58 

8 

76 

7.6 

5.4 

87 

71 

0.89 

58 

12 

74 

5.1 

3.6 

82 

70 

0.86 

67 

12 

78 

5.8 

2.9 

84 

74 

0.80 

71 

12 

80 

6.2 

2.5 

85 

77 

0.76 

75 

12 

82 

6.5 

2.2 

86 

79 

0.73 

79 

12 

84 

6.9 

1.8 

86 

82 

Total  RT  (sec) 

>2.67 

58 

0 

80 

100 

72 

2.38 

58 

4 

78 

15.2 

10.8 

93 

71 

2.14 

63 

4 

80 

16.3 

9.8 

94 

74 

1.97 

67 

4 

82 

17.3 

8.7 

94 

76 

1.83 

71 

4 

84 

18.4 

7.6 

94 

78 

1.74 

71 

8 

82 

9.2 

3.8 

89 

77 

1.70 

75 

8 

84 

9.8 

3.3 

90 

80 

1.68 

75 

12 

82 

6.5 

2.2 

86 

79 

1.66 

79 

12 

84 

6.9 

1.8 

86 

82 

1.62 

88 

12 

88 

7.6 

1.1 

88 

88 

1.60 

88 

15 

86 

5.7 

0.8 

84 

88 

Difficult  Correct 

<12 

50 

0 

100 

68 

13 

79 

0 

100 

84 

14 

88 

0 

100 

90 

15 

92 

0 

100 

93 

18 

100 

0 

100 

100 

21 

100 

15 

92 

6.5 

0 

86 

100 

Total  Correct 

<35 

67 

0 

100 

76 

36 

75 

0 

100 

81 

38 

92 

0 

100 

93 

39 

96 

0 

100 

96 

42 

100 

0 

100 

100 

45 

100 

15 

92 

6.5 

0 

86 

100 
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Test  Variable 

Cutoff 

%oBR 

(n=24) 

% 

(n=26) 

Hit 

Rate 

LR+ 

LR- 

PPV 

NPV 

MSVT  IR  (%  correct) 

<70 

13 

0 

58 

100 

55 

75 

38 

0 

70 

100 

63 

80 

42 

0 

72 

100 

65 

85 

54 

0 

78 

100 

70 

90 

71 

0 

86 

100 

79 

95 

79 

0 

90 

100 

84 

DR  (%  correct) 

<65 

38 

0 

70 

100 

63 

70 

46 

0 

74 

100 

67 

75 

54 

0 

78 

100 

70 

85 

75 

0 

88 

100 

81 

90 

79 

4 

88 

20.6 

5.4 

95 

83 

95 

88 

12 

88 

7.6 

1.1 

88 

88 

CNS  (%  correct) 

<70 

54 

0 

78 

100 

70 

75 

63 

0 

82 

100 

74 

80 

67 

0 

84 

100 

76 

85 

71 

0 

86 

100 

79 

90 

75 

4 

86 

19.5 

6.5 

95 

81 

95 

92 

12 

90 

7.9 

0.7 

88 

92 

PA  (%  correct) 

<45 

8 

0 

56 

100 

54 

55 

17 

0 

60 

100 

57 

65 

38 

0 

70 

100 

63 

75 

67 

0 

84 

100 

76 

85 

75 

0 

88 

100 

81 

95 

88 

4 

92 

22.75 

3.25 

95 

89 

FR  (%  correct) 

<45 

29 

0 

66 

100 

60 

50 

42 

0 

72 

100 

65 

55 

54 

0 

78 

100 

70 

60 

75 

4 

86 

19.5 

6.5 

95 

81 

65 

83 

4 

90 

21.7 

4.3 

95 

86 

Fail  Any  Subtest 

5 

0 

0 

52 

100 

52 

(#  subtests  failed) 

4 

13 

0 

58 

100 

55 

3 

54 

0 

78 

100 

70 

2 

71 

0 

86 

100 

79 

1 

75 

0 

88 

100 

81 

Note:  VSVT  =  Victoria  Symptom  Validity  Test;  RT  =  response  latency;  SD  =  standard  deviation;  IR;  MSVT  =  Medical  Symptom 
Validity  Test;  FR  =  Free  Recall  %  Correct;  CNS  =  Consistency  %  Correct;  PA  =  Paired  Associates  %  Correct;  DR  =  Delayed  Recall 
%  Correct;  =  Immediate  Recall  %  Correct. 
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Table  15:  Hierarchical  logistic  regressions  examining  incremental  validity  of 
representative  variables 


3C2- 

Model 

change 

R 2- 

Model 

Block 

Scale 

-r  m 

p 

(df) 

p 

R 2 

change 

Embedded -> 
BEAM 

1 

RDS-R 

40.7(1) 

<.001 

.74 

2 

SacRT-IIV- 

Overall 

58.1  (2) 

<.001 

17.4(1) 

<.001 

.92 

.18 

Freestandinga-> 

BEAM 

1 

MSVT  FR 

39.8(1) 

<.001 

.74 

2 

ManRT-IIV- 

Overall 

53.3  (2) 

<.001 

13.5(1) 

<.001 

.89 

.15 

Embedded -> 
Freestandinga-> 

1 

RDS-R 

40.7(1) 

<.001 

.74 

BEAM 

2 

MSVT  FR 

52.5  (2) 

<.001 

11.8(1) 

.001 

.87 

.13 

"2 

SacRT-IIV- 

U 

u 

U 

u 

U 

U 

J 

Overall 

D 

D 

D 

D 

D 

D 

Note:  RDS-R  =  Reliable  Digit  Span-Revised;  SacRT-IIV  =  saccadic  reaction  time  intra-individual  variability;  MSVT  =  Medical 
Symptom  Validity  Test;  FR  =  Free  Recall  %  Correct;  ManRT-IIV  =  manual  reaction  time  intra-individual  variability. 
aVSVT  Difficult  Correct  and  VSVT  Total  Correct  had  better  classification  accuracy  than  MSVT  FR  but  could  not  be  loaded  into  the 
regression  models. 

bThe  model  achieved  exact  classification  accuracy  and  values  could  not  be  calculated. 
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Table  16:  Demographic  characteristics  of  clinical  comparison  group 


UR-mTBI  (n=19) 

F“ 

P 

Mean  age 
in  years  (SD) 

34.3(13.3) 

1.96 

.15 

Mean  years  of 
education  (SD) 

16.2(2.4) 

0.67 

.52 

Estimated  premorbid 

114(7.7) 

0.77 

.47 

intelligence  (SD) 

/(df)a 

Gender  (%) 

0.54  (2) 

.76 

Male 

7  (36.8) 

Female 

12  (63.2) 

Race/ethnicity  (%) 

9.22  (8) 

.32 

Caucasian 

15(78.9) 

African-American 

2(10.5) 

Hispanic 

1  (5.3) 

Asian 

1  (5.3) 

Injury  characteristics 

Median  years  since 
injury  (IQR) 

6.9  (2.32-21.6) 

LOC  length  in 
minutes  (SD) 

3.00  (4.29) 

PTA  length  in 
minutes  (SD) 

18.1  (50.8) 

Note:  UR-mTBI  =  unbiased  responders  with  a  history  of  mild  TBI;  IQR  =  interquartile  range;  LOC  =  loss  of  consciousness;  PTA  = 

posttraumatic  amnesia;  SD  =  standard  deviation. 

aOne-way  ANOVA  or  chi-square  with  BR,  UR,  and  UR-mTBI  groups. 
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Table  17:  Group  comparisons  of  normally  distributed  BEAM  and  embedded  RVT  variables  with  excellent  classification  accuracy 

1)  UR  2)  UR-mTBI  3)BR  AXOVA  Cohen’s  effect  sizes  (d) 
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Table  18:  Group  comparisons  of  non-normally  distributed  BEAM  and  embedded  RVT  variables  with  excellent  classification  accuracy 
_ 1)UR _  2)  UR-mTBI _ 3)BR _ Kruskal-Walhs _ r _ 
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Table  19:  Cumulative  percentages  of  persons  with  scores  above  the  indicated  cutoff  on 
selected  BEAM  variables 


Variable 

Cutoff 

BR“  (%) 

UR-mTB /  (%) 

UR1  (%) 

SacRT-IIV-DC  (sec) 

>0.154 

38 

0.153 

38 

5 

0.151 

42 

5 

0.147 

46 

5 

0.142 

50 

5 

0.137 

54 

5 

0.131 

58 

5 

0.130 

63 

5 

0.129 

67 

5 

0.128 

71 

5 

0.127 

71 

5 

4 

0.126 

75 

5 

4 

0.125 

79 

5 

4 

0.119 

79 

5 

8 

0.113 

79 

5 

12 

0.110 

79 

11 

12 

0.109 

79 

11 

15 

0.104 

79 

16 

15 

SacRT -II V -MDC  (sec) 

>0.189 

29 

0.179 

29 

5 

0.173 

33 

5 

0.169 

38 

5 

0.167 

42 

5 

0.165 

46 

5 

0.162 

50 

5 

0.159 

54 

5 

0.152 

58 

5 

0.144 

63 

5 

0.142 

67 

5 

0.139 

71 

5 

0.135 

71 

11 

0.134 

75 

11 

0.133 

75 

16 

0.131 

75 

21 

0.129 

79 

21 

0.126 

83 

26 

0.123 

83 

26 

4 

0.122 

88 

26 

4 

0.119 

88 

26 

8 

0.115 

88 

26 

12 

0.113 

88 

32 

12 

0.112 

88 

32 

15 

SacRT-IIV-UCG  (sec) 

>0.177 

13 

0.174 

13 

5 

0.169 

17 

5 

0.162 

22 

5 

0.157 

26 

5 

0.154 

30 

5 

0.152 

35 

5 

0.151 

39 

5 

0.148 

43 

5 

0.146 

48 

5 

0.143 

52 

5 

0.138 

57 

5 

0.137 

61 

5 

0.136 

65 

5 

0.131 

70 

5 

0.125 

74 

11 
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Variable 


SacRT-IIV -Overall  (sec) 


ManRT-IIV-DC  (sec) 


ManRT-IIV -NDC  (sec) 


ManRT -II V -MDC  (sec) 


Cutoff 

BR“  (%)  UR-mTBlb  (%) 

UR‘  (%) 

0.122 

78 

11 

0.117 

78 

11 

4 

0.115 

78 

16 

4 

0.113 

78 

21 

4 

0.109 

78 

26 

8 

0.102 

78 

32 

12 

0.099 

78 

32 

15 

0.097 

83 

32 

15 

0.095 

87 

32 

15 

>0.135 

63 

0.132 

67 

5 

0.130 

71 

11 

0.125 

75 

16 

0.120 

79 

21 

0.118 

79 

21 

0.117 

83 

21 

4 

0.116 

83 

21 

4 

0.115 

88 

21 

8 

0.113 

88 

26 

8 

0.112 

88 

26 

12 

0.108 

88 

37 

15 

0.106 

92 

37 

15 

0.102 

96 

42 

15 

>0.137 

74 

0.130 

78 

0.127 

83 

0.126 

83 

0.123 

83 

5 

4 

0.122 

83 

11 

4 

0.121 

83 

16 

4 

0.120 

83 

21 

8 

0.117 

87 

26 

8 

0.112 

91 

32 

8 

0.105 

91 

32 

8 

0.094 

91 

37 

12 

0.093 

91 

37 

15 

>0.141 

39 

0 

0.140 

39 

5 

0.138 

43 

5 

0.134 

48 

5 

0.129 

52 

5 

0.126 

57 

5 

0.119 

61 

5 

0.117 

61 

11 

0.113 

65 

11 

0.111 

65 

11 

4 

0.110 

70 

11 

4 

0.109 

74 

11 

4 

0.108 

78 

11 

4 

0.106 

83 

11 

4 

0.104 

87 

11 

4 

0.103 

87 

16 

4 

0.101 

91 

16 

4 

0.099 

91 

21 

8 

0.094 

91 

26 

15 

0.092 

96 

26 

15 

>0.134 

56 

0.131 

56 

5 

0.127 

61 

5 

0.125 

67 

5 

0.123 

72 

5 
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Variable 


ManRT-IIV -UCG  (sec) 


ManRT-IIV -UC  (sec) 


ManRT-IIV -Overall  (sec) 


Cutoff 

BR“  (%)  UR-mTBlb  (%) 

UR‘  (%) 

0.120 

12 

11 

4 

0.118 

72 

11 

8 

0.117 

72 

16 

8 

0.116 

78 

16 

8 

0.114 

78 

16 

12 

0.111 

83 

16 

15 

>0.151 

5 

0.148 

5 

5 

0.147 

10 

5 

0.145 

14 

5 

0.142 

14 

11 

0.140 

19 

11 

0.138 

24 

11 

0.135 

29 

11 

0.132 

33 

11 

0.131 

38 

11 

0.130 

43 

11 

0.129 

43 

11 

4 

0.128 

48 

11 

4 

0.127 

52 

11 

4 

0.126 

57 

11 

4 

0.125 

62 

11 

4 

0.123 

67 

16 

4 

0.121 

67 

21 

4 

0.118 

71 

21 

4 

0.115 

76 

21 

4 

0.112 

76 

26 

8 

0.107 

81 

26 

8 

0.103 

86 

26 

8 

0.100 

86 

26 

12 

0.097 

86 

32 

15 

0.096 

90 

32 

15 

0.095 

95 

32 

15 

>0.150 

22 

0.146 

26 

0.140 

26 

4 

0.139 

26 

5 

4 

0.136 

30 

5 

4 

0.134 

35 

5 

4 

0.132 

39 

5 

4 

0.131 

43 

5 

4 

0.130 

48 

5 

4 

0.126 

52 

5 

4 

0.122 

57 

5 

4 

0.121 

57 

11 

4 

0.119 

61 

11 

4 

0.117 

61 

11 

8 

0.116 

61 

16 

8 

0.111 

65 

16 

8 

0.108 

65 

21 

8 

0.107 

65 

26 

8 

0.105 

65 

32 

12 

0.104 

74 

32 

15 

>0.128 

70 

0.127 

74 

0.122 

78 

0.118 

78 

5 

4 

0.117 

78 

11 

8 

0.113 

83 

16 

8 

0.111 

83 

16 

8 

0.110 

83 

21 

8 
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Variable 

Cutoff 

BR“  (%)  UR-mTBlb  (%) 

UR‘  (%) 

0.106 

87 

21 

8 

0.104 

91 

26 

8 

0.100 

91 

32 

8 

0.099 

91 

32 

8 

0.094 

96 

37 

8 

0.093 

96 

37 

12 

0.092 

96 

37 

15 

Saccadic  Commissions 

>61.5 

54 

(%  of  DCR  trials) 

57.9 

58 

5 

55.4 

63 

5 

53.3 

67 

5 

52.6 

71 

5 

47.9 

75 

5 

45.7 

75 

11 

43.5 

79 

16 

43.1 

83 

16 

4 

41.8 

83 

16 

8 

33.2 

83 

21 

8 

32.9 

83 

21 

12 

24.0 

83 

26 

15 

Manual  Omissions 

>30.4 

21 

(%  of  non-DCR  trials) 

28.5 

21 

5 

25.5 

25 

5 

20.8 

29 

5 

17.4 

29 

5 

4 

17.3 

33 

5 

4 

17.1 

38 

5 

4 

14.1 

42 

5 

4 

10.7 

46 

5 

4 

9.7 

50 

5 

4 

8.2 

54 

5 

4 

7.9 

54 

11 

4 

7.1 

58 

11 

4 

5.9 

67 

11 

4 

5.2 

71 

11 

4 

4.3 

75 

11 

4 

3.3 

79 

11 

4 

3.0 

83 

11 

4 

2.7 

88 

11 

4 

2.4 

92 

11 

4 

1.7 

92 

11 

8 

1.0 

92 

11 

12 

Note.  BR  =  biased  responders;  UR  =  unbiased  responders;  Sac  =  saccadic;  Man  =  manual;  RT  =  reaction  time;  IIV  =  intra-individual 
variability;  DC  =  directional  cue;  NDC  =  nondirectional  cue;  MDC  =  misdirectional  cue;  UCG  =  uncued  with  gap;  UC  =  uncued. 
“n=24  for  all  BR  variables  except  Saccadic  RT-IIV-UCG  (n=23),  Manual  RT-IIV-DC  (n=23).  Manual  RT-IIV-NDC  (n=23),  Manual 
RT-IIV-MDC  (n=18),  Manual  RT-IIV-UCG  (n=21).  Manual  RT-IIV-UC  (n=23),  and  Manual  RT-IIV-Overall  (n=23). 
bn=19  for  all  UR-mTBI  variables. 
bn=26  for  all  UR  variables. 
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Table  20:  Cumulative  percentages  of  persons  with  scores  above  or  below  the  indicated 
cutoff  on  selected  embedded  RVT  variables 

Test  Variable 

Cutoff 

BR 

(n=24) 

UR-mTBI 
( n=19 ) 

UR 

(n=26) 

WAIS-IV:  ACSS 

<5 

33 

Digit  Span 

6 

46 

4 

7 

67 

4 

8 

71 

5 

4 

9 

92 

16 

12 

10 

92 

32 

15 

RDS 

<6 

42 

7 

54 

4 

8 

67 

5 

4 

9 

92 

16 

12 

RDS-R 

<11 

54 

12 

63 

4 

13 

75 

5 

4 

14 

88 

16 

8 

15 

92 

37 

12 

ARDS 

<7 

33 

8 

50 

5 

9 

67 

16 

10 

88 

26 

8 

11 

96 

53 

23 

CPT-II  Commissions  (raw) 

>22 

42 

21 

67 

5 

20 

71 

16 

19 

75 

16 

4 

18 

79 

26 

4 

17 

79 

32 

12 

16 

83 

42 

12 

15 

83 

47 

15 

Omissions  (raw) 

>6 

63 

5 

67 

5 

4 

67 

11 

4 

3 

71 

16 

15 

Hit  RT  SE  (sec) 

>7.9 

50 

7.2 

50 

5 

6.9 

50 

11 

6.7 

50 

11 

4 

6.4 

58 

11 

4 

6.0 

58 

16 

4 

5.6 

71 

16 

8 

5.5 

71 

21 

8 

5.3 

75 

26 

12 

5.0 

88 

26 

15 

Note.  BR  =  biased  responders;  UR=mTBI  =  unbiased  responders  with  a  history  of  mild  TBI;  UR  =  unbiased  responders;  WAIS-IV  = 
Wechsler  Adult  Intelligence  Scale-4,h  Edition;  ACSS  =  Age  Corrected  Scaled  Score;  RDS  =  Reliable  Digit  Span;  RDS-R  =  Reliable 
Digit  Span-Revised;  ARDS  =  Alternative  Reliable  Digit  Span;  CPT-II  =  Conners’  Continuous  Performance  Test-II. 
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APPENDIX  B:  FIGURES 


Figure  1 :  Stepwise  Logistic  Regression  Illustration  for  BEAM  Variables 
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Figure  2:  Variable  Reduction  Depiction  for  Simulator  Study  Variables 
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Figure  3:  Saccadic  RT-IIV-Directional  Cue  Sensitivity  and  Specificity  by  Cutoff  Score 
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Figure  4:  Manual  Omission  Error  Rate  Sensitivity  and  Specificity  by  Cutoff  Score 
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%  Exceeding  Cutoff 


Figure  5:  Manual  RT-IIV-Misdirectional  Cue  Sensitivity  and  Specificity  by  Cutoff  Score 
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APPENDIX  C:  SUPPLEMENTAL  MATERIALS 


Picture  1 :  Computer  Monitor,  ASL  D6  Eye  Tracker,  and  Cedrus  Response  Pad 


Picture  2:  ASL  EYE-TRAC  6  Control  Unit  and  Computers 
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Picture  3:  Examiner  Station  with  ASL  LCD  Monitors 
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Script  1 :  Simulator  Study  Group  Assignment  Scripts 


The  following  group  assignment  scripts  were  adapted  from  several  dissimulation  studies 
(71;  251;  257;  279): 

Unbiased  Responding  (UR)  Group  Script: 

In  this  study,  you  will  be  asked  to  complete  several  tests  of  memory  and  attention 
that  are  often  used  to  evaluate  people  who  sustain  head  injuries.  Your  part  in  this  project 
is  to  imagine  that,  six  months  ago,  you  were  involved  in  a  motor  vehicle  accident  in 
which  you  hit  your  head,  but  you  were  not  injured  at  all.  Today,  you  feel  normal  and 
unhanned.  You  have  been  asked  to  undergo  a  routine  neuropsychological  evaluation  to 
help  confirm  that  there  are  no  problems.  Your  goal  is  to  prove  that  your  cognitive 
abilities  are  nonnal  by  performing  as  well  as  you  possibly  can. 

Please  follow  instructions  carefully  and  complete  all  tasks  to  the  best  of  your 
ability.  Try  to  do  the  best  you  can.  Do  you  understand  what  you  are  to  do?  Do  you  have 
any  questions? 

Biased  Responding  (BR)  Group  Script: 

In  this  study,  you  will  be  asked  to  complete  several  tests  of  memory  and  attention 
that  are  often  used  to  evaluate  people  who  sustain  head  injuries.  Your  part  in  this  project 
is  to  take  the  tests  while  playing  the  role  of  a  person  who  is  exaggerating  their  problems 
associated  with  concussions  or  head  injuries.  Some  individuals  who  sustain  traumatic 
brain  injuries  feel  nonnal,  or  unhanned,  following  their  injury,  but  they  may  simulate 
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injury  or  exaggerate  symptoms  to  obtain  financial  rewards.  We  want  to  know  what  this 
faked  performance  looks  like. 

Imagine  that,  six  months  ago,  you  were  involved  in  a  motor  vehicle  accident  in 
which  you  hit  your  head.  Even  though  you  feel  nonnal  today,  you  know  that  the  amount 
of  money  you  will  receive  from  your  insurance  company  depends  on  how  badly  you  were 
injured.  You  will  try  to  get  extra  money  by  exaggerating  your  problems  on  the  tests  you 
are  about  to  take.  In  other  words,  you  are  to  alter  your  perfonnance  to  suggest  that  your 
cognitive  functioning  has  been  impaired  from  the  head  injury  you  sustained  in  the 
accident. 

Your  goal  is  to  produce  the  most  severely  impaired  perfonnance  you  can 
WITHOUT  the  examiner  knowing  that  you  are  faking  or  pretending.  Imagine  that  if  you 
pretend  well  enough,  you  will  receive  a  large  sum  of  money,  but  if  you  are  caught,  you 
will  get  nothing.  Keep  in  mind  that  the  deficits  you  portray  must  be  believable.  Major 
exaggerations,  such  as  remembering  absolutely  nothing,  are  easy  to  detect.  The  tests  you 
are  about  to  take  have  ranges  of  scores  associated  with  brain  damage,  but  also  ranges  of 
scores  associated  with  faking  bad.  Therefore,  if  you  magnify  your  symptoms  too  much 
and  they  are  too  obvious,  the  tests  will  identify  you  as  someone  trying  to  fake  bad  rather 
than  someone  who  is  head  injured. 

Remember,  you  have  to  be  convincing  in  your  performance.  This  is  going  to  take 
some  skill  on  your  part.  You  will  have  to  remind  yourself  throughout  the  testing  what  you 
are  trying  to  do. 

Do  you  understand  what  you  are  to  do?  Do  you  have  any  questions? 
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Script  2:  Simulator  Study  Debrief  Scripts 


The  following  group  assignment  scripts  were  adapted  from  several  dissimulation  studies 
(71;  251;  257;  279): 

Unbiased  Responding  (UR)  Debrief  Script: 

Thank  you  for  participating  in  this  study.  Your  research  results  will  be  combined 
with  others  who  also  gave  their  best  effort,  and  then  compared  to  a  group  of  participants 
who  were  asked  to  perform  as  if  they  had  sustained  a  traumatic  brain  injury.  Your 
participation  today  will  help  us  identify  what  elements  of  our  eye  tracking  measure  are 
affected  by  test-taking  effort,  so  that  we  can  use  this  tool  most  effectively  to  assess  brain 
functioning  after  injury.  Do  you  have  any  questions  or  concerns? 

Biased  Responding  (BR)  Debrief  Script: 

Thank  you  for  participating  in  this  study.  Your  research  results  will  be  combined 
with  others  who  also  simulated  the  effects  of  brain  injury,  and  then  compared  to  a  group 
of  participants  who  were  asked  to  give  their  best  effort.  Your  participation  today  will 
help  us  identify  what  elements  of  our  eye  tracking  measure  are  affected  by  test-taking 
effort,  so  that  we  can  use  this  tool  most  effectively  to  assess  brain  functioning  after 
injury.  Do  you  have  any  questions  or  concerns? 
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Measure  1 :  Simulator  Study  Feedback  Interview 


Phase  4  Feedback  Interview 

(Administered  by  lab  member  OTHER  THAN  the  examiner) 

Participant  ID: _  Date: _ 

1)  Participant’ s  group  assignment  (circle  one)  Best  effort  Simulated  TBI 

2)  ‘What  was  your  strategy  for  taking  the. . . 

A)  Eye  tracking  measure?” 

B)  CPT?” 

C)  VSVT?” 

D)  Trail  AT’ 

E)  Trail  B?" 

F)  Digit  Span?” 

G)  MSVT?” 

H)  King-Devick  Test?” 

1 
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For  Simulated  TBI  group  only: 

3)  "On  a  scale  from  1  to  5,  where  5  is  the  most  confident,  how  confident  are  you  that  you 
performed  within  the  range  of  a  head-injured  person  on  the  following  tests?” 


Eye 

tracking 

CPT 

VSVT 

Trail  A 

Trail  B 

Digit 

Span 

MSVT 

K-D 

Test 

4)  "Do  you  have  any  questions  or  concerns  about  the  study?” 

The  following  questions  are  meant  for  the  examiner: 

(DO  NOT  tell  the  participant’s  group  before  or  after  these  questions  are  asked) 

1)  "What  is  your  best  guess  of  the  participant’s  group  assignment?” 

(circle  one)  Best  effort  Simulated  TBI 

2)  “On  a  scale  from  1  to  5,  where  5  is  the  most  confident,  how  confident  are  you  about 
your  group  assignment  guess?” 
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Measure  2:  Simulator  Study  Baseline  Interview 


HM  UNIFORMED  SERVICES  UNIVERSITY 

■mb'  of the Health  Sciences 

Nr 

_ Eve  Tracking  Baseline  Interview  -  Phase  4 _ 

Date: _ / _ / _ PID: _  Supervisor: _  Examiner: _  Entered  By: 


Civilian  Demographics 


[l_age] 

Age: 

[l_yob] 

Year  of  Birth: 

[l_handed] 

Handedness: 

1)  Right  2)  Left  3)  Ambidextrous/Mixed 

[1  qender ] 

Gender: 

Male  Female 

[l_race] 

Race: 

1)  White  5)  Native  Hawaiian  or  Pacific  Islander 

2)  Hispanic  6)  American  Indian  or  Alaska  Native 

3)  Asian  7)  Other 

4)  Black  or  African  American 

[l_ethnic] 

Ethnicity: 

0)  Not  Hispanic  or  Latino  1)  Hispanic  or  Latino 

[l_maritol] 

Marital  Status 

0)  Single  3)  Divorced 

1)  Married/Legally  Partnered  4)  Widowed 

2)  Legally  Separated  5)  Other 

Military  Demographics 


[l_branch] 

Branch: 

1)  Marine  Corps  4)  Air  Force 

(ever/most  recent) 

2)  Army  5)Coast  Guard 

3) Navy  9999)  Never  served 

fl  rank] 

Highest  Rank  Achieved 

[l_veteran] 

Veteran  of: 

1)  OEF  (Afghanistan)  3)  OEF/OIF  (Both) 

2)  OIF  (Iraq)  4)  Other  (Specify) 

fl  deploy  tenth] 

Total  length  of  deployment  (in  months) 

Ettenhofer  Neurocognitive  Laboratory 
USUHS-MPS 


MLE.ET.lnt.10.25.12.P4 

1 


170 


IBM  Uniformed  services  University  Participant  ID: 

■  ^  i  ■  i  of the  Ht a  lib  Sciences 


[l_duty] 

Duty  Status:  0)N/A  l)Reserve  2)Guard  3)Active  4)Separated 

[l_duty_date] 

Date  of  separation  from  duty: 

Educational  Background 


[l_edu] 

Number  of  years  of  education:  (or  number  of  years  before  GED,  if  applicable) 

GED?  0)  No  1)  Yes 

[l_us_edyrs] 

Number  of  years  educated  in  the  US: 

lUd] 

History  of  diagnosed  learning  disability?  0)  No  1)  Yes 

Specify 

[l_adhd] 

History  of  ADHD  diagnosis  before  18?  0)  No  l)Yes 

Language  Spoken 


[l_langprim] 

Primary  Language  Spoken: 

1)  English  2)  English  as  a  Second  Language  3)  Learned  both  same  time 

[Ijanglst] 

What  was  your  first  language  (if  other  than  english)? 

1)  Spanish  2)  Other  (specify)  9999)  N/A 

[l_age_eng] 

Age  (in  years)  when  first  learned  English:  (zero  if  from  birth) 

Functional  Information 


[l_employ] 

Are  you  currently  employed?  (Not  including  inactive  Guard/Reserve) 

0)  No  2)  Yes  -  Part-time  (non-military) 

1)  Yes  -  U.S.  Military  3)  Yes  -  Full-time  (non-military) 

[l_curred] 

Are  you  currently  enrolled  in  any  educational  programs?  (non-related  to  any  rehab  programs) 

0)  No  1)  Yes  -  Part-time  2)  Yes  -  Full-time 

[l_disab] 

Do  you  currently  receive  disability  compensation  of  any  kind? 

0)  No  1)  Yes  -  Specify: 

Ettenhofer  Neurocognitive  Laboratory  MLE.ET.lnt.10.25.12.P4 

USUHS-MPS  2 


171 


■  ■|R1UN,n.KMM> SWVIC1S UNivKKSmr  Participant  ID: 

tiff  Ike  Health  StUm*\ 


Other  Medical  History 


[l_birthcom] 

Were  there  any  medical  complications  at  the  time  of  your  birth? 

0)  No  1)  Yes  -  If  so,  did  these  affect  your  health  afterward?  How? 

[ 1  birth  com  text ] 

Specify: 

Do  you  have  a  history  of  any  of  the  following  conditions? 

No  Yes 

[l_tumor] 

0  1 

Tumor 

[l_thyroid] 

0  1 

Thyroid  Disorder 

[ l_cereb_infect ] 

0  1 

Brain  Infection 

[l_headache] 

0  1 

Headache  (pre-TBI/no  TBI) 

[l_headacheTBI] 

N/A 

0  1 

Headache  (post-TBI  or  circle  N/A  if  no  TBI  reported) 

[l_vision _prob] 

0  1 

Vision  Problems 

Pjmkmjtextl] 

[l_  vision_correct ] 

0  1 

Specify: 

Vision  Corrected 

[l_  vision_  text2 ] 

[l_seizure] 

0  1 

Specify: 

Seizures  (pre-TBI/no  TBI) 

[l_seizureTBI] 

N/A 

0  1 

Seizures  (post-TBI  or  circle  N/A  if  no  TBI  reported) 

[!_stroke] 

0  1 

Stroke 

[l_toxic_exp] 

0  1 

Toxic  Exposure 

[lcvd/_date] 

0  1 

Cardiovascular  Disease  Dx  Date: 

[l_hypertens/_date] 

0  1 

Diagnosis  of  Hypertension  Dx  Date: 

[l_heartattack] 

0  1 

Heart  Attack 

[ l_cad/_dote ] 

0  1 

Coronary  Artery  Disease  Dx  Date: 

[l_diab€tes] 

0  1 

Diabetes 

[l_smoking/_daity] 

[l_medother_text] 

0  1 

0  1 

Smoking  Cigs/day: 

Other  Medical  history? 
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Uniformed  Services  university 

of  the  Health  Sciences 


Participant  ID: 


[l_sJeep_lastnight] 
[l_sJeep_  typical] 

[I _ fatigue ] 


Do  you  take  any  medications? 

Medication  Name/Purpose 

Dosage 

Frequency 

Hours  Since  Last  Dose 

[medl  ] 

1) 

[med2] 

2) 

[med3] 

3) 

[med4] 

4) 

[med5] 

5) 

[med6] 

6) 

[med7] 

7) 

[med8] 

8) 

[medS] 

9) 

[medlO] 

10) 

P_med _psych] 

Any  psychiatric  medications  as  listed?  0)  No  1)  Yes 

P_med _pain  ] 

Any  pain  medications  as  listed?  0)  No  1)  Yes 

P_med_sleep] 

Any  sleep  medications  as  listed?  0)  No  1)  Yes 

How  many  hours  of  sleep  did  you  get  last  night?  _ 

How  many  hours  of  sleep  per  night  has  been  typical  for  you  in  the  last  month?  _ 

On  a  scale  of  one  to  ten  (ten  being  the  most  tired),  what  is  your  current  level  of  fatigue? 

123456789  10 
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|||M  UNIFORMED  SERVICES  UNIVERSITY 

of the  Ilealrb  Scienctt 


Participant  ID: 


Substances 


Have  you  had  any  of  these  within  the  last  12  hours? 

No  Yes 

[l_alcohol] 

0  1 

Alcohol  #  Drinks  Hours  since  use 

[l_nicotine] 

0  1 

Nicotine  #  Cigs  Hours  since  use 

[l_caffeine] 

0  1 

Caffeine  #  Drinks  Hours  since  use 

Individuals  who  have  had  more  than  i  drinks  within  the  last  12  hours,  or  any  drinks 
within  the  last  4  hours  should  NOT  be  tested  (reschedule  testing) 


Notes: 
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Measure  3:  Parent  Study  TBI  Cohort  Baseline  Interview 


I  HIM  UNIFORMED  SERVICES  UNIVERSITY 

of tbt  Htaltb  Scitncti 


Eve  Tracking  Baseline  Interview  -  Phase  3 


Date:  /  /  PID:  Examiner:  Entered  By: 

General  Demographics 

D-Oge] 

Age: 

ILyob] 

Year  of  Birth: 

phanded] 

Handedness:  1)  Right  2)  Left  3)  Ambidextrous/Mixed 

[l_gender] 

Gender:  1)  Male  2)  Female 

[trace] 

Race:  1)  White  5)  Native  Hawaiian  or  Pacific  Islander 

6)  American  Indian  or  Alaska  Native 

[l_race_text] 

3)  Asian  7)  Other 

4)  Black  or  African  American 

[t_ethnic] 

Ethnicity:  0)  Not  Hispanic  or  Latino  1)  Hispanic  or  Latino 

p_mahtat) 

Marital  Status:  0)  Single  3)  Divorced 

1)  Married/Legally  Partnerec4)  Widowed 

[l_marital_text] 

2)  Legally  Separated  5)  Other 

Military  Demographics 


P_branch ] 

p_branch_txt] 

Branch: 

(ever/most  recent) 

1)  Marine  Corps  4)  Air  Force  0)  Never  served 

2)  Army  5)Coast  Guard 

3) Navy  6)  Other 

p_rank] 

Highest  Rank  Achieved: 

P_veteron] 

Veteran  of: 

1)  OEF  (Afghanistan)  3)  OEF/OIF  (Both)  o)  None 

2)  OIF  (Iraq)  4)  Other  (Specify) 

Ettenhofer  Neurocognitive  Laboratory  MLE  ET  Int  06  15  12  P3 
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I  |N  UNIFORMED  SERVICES  UNIVERSITY  Participant  ID: 

of the  Health  Sciences 

V 


[l_deploy_length] 

Total  length  of  all  deployments  (in  months):  0)  None 

[Icombatlength] 

Total  length  of  deployment  in  a  combat  zone  (in  months):  o)  None 

P_duty] 

Duty  Status:  9999)N/A  l)Reserve  2)Guard  3)Active4)Separated 

p_duty_date] 

Date  of  separation  from  duty:  / _ 

p_sep_text]  Conditions  of  separation: 

Educational  Background 


p_edu] 

Highest  level  of  education:  (or  number  of  years  before  GED,  if  applicable) 

ILgedl 

GED  (versus  high  school  diploma):  0)  No  1)  Yes 

P_us_edyrs] 

Number  of  years  educated  in  the  US: 

PJd] 

History  of  diagnosed  learning  disability:  0)  No  1)  Yes 

p_ldt_text] 

Specify 

p_adM] 

Language  Spoken 

History  of  ADHD  diagnosis  before  18:  0)  No  1)  Yes 

PJangprim] 

Was  English  your  second  language? 

1)  No  2)  Yes  3)  Learned  both  languages  at  the  same  time 

P_lstlang] 

What  was  your  first  language  (if  other  than  English)? 

p_lstlang_text] 

1)  Spanish  2)  Other  (specify)  9999)  N/A 

lLage_eng] 

Age  (in  years)  when  first  learned  English:  (zero  if  from  birth) 

Functional  Information 


[l_employ] 

Are  you  currently  employed?  (Not  including  inactive  Guard/Reserve) 

0)  No  2)  Yes  -  Part-time  (non-military) 

1)  Yes  -  U.S.  Military  3)  Yes  -  Full-time  (non-military) 

P_curred] 

Are  you  currently  enrolled  in  any  educational  programs?  (non-related  to  any  rehab  programs) 

0)  No  1)  Yes  -  Part-time  2)  Yes  -  Full-time 
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I |W UNIFORMED  SERVICES  UNIVERSITY 

■wh  of the  Health  Sciences 

*4? _ 


pdisabj 


Participant  ID: 

Do  you  currently  receive  disability  compensation  of  any  kind? 


p_disab_text] 


p_hobbies] 


P_social] 


0)  No  1)  Yes  -  Specify: _ 

Do  you  currently  receive  help  (from  family,  paid  helpers,  etc)  or  share  responsibilities  with 
somoeone  else  for  any  of  the  following  activities? 


Self-care 

Financial 

Housework 

Shopping 

(washing,  dressing) 
[l_currselfcore] 

[l_currfinancial] 

(laundry,  cleaning) 
[l_currhouseworkl 

P_cvrrshopping] 

0)  No  help 

0)  No  help 

0)  No  help 

0)  No  help 

1)  Some 

1)  Some 

1)  Some 

1)  Some 

2)  Majority/Total 

2)  Majority/Total 

2)  Majority/Total 

2)  Majority/Total 

Highest  Previous 

Highest  Previous 

Highest  Previous 

Highest  Previous 

[ \_highselfcare ] 

[l_highfinancial] 

[l_highhousework] 

p_highshopping] 

0)  No  help 

0)  No  help 

0)  No  help 

0)  No  help 

1)  Some 

1)  Some 

1)  Some 

1)  Some 

2)  Majority/Total 

2)  Majority/Total 

2)  Majority/Total 

2)  Majority/Total 

On  a  scale  of  1-10  (with  10  being  the  highest),  how  satisfied  are  you  with  your  current 
involvement  in  personal  hobbies  and  interests? 

123456789  10 

On  a  scale  of  1-10  (with  10  being  the  highest),  how  satisfied  are  you  with  your  current 
involvement  in  social  activities7 


123456789  10 

Most  Recent  Traumatic  Brain  Injury  History 


p_num_tbi] 

Have  you  ever  lost  consciousness  or  forgotten  information  as  a  result  of  an  explosion  or 
a  blow  to  the  head?  If  so,  how  many  times  has  this  occurred  throughout  the  course  of  your 
lifetime?  (note:  this  requires  LOC  or  PTA) 

p_datetbil] 

When  was  the  most  recent  incident? 

_ /  /  Estimates  are  OK  N/A=  99/99/9999 

Ettenhofer  Neurocognitive  Laboratory  MLE-ETJnt.06.15.12.P3 
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I HlM  UNIFORMED  SERVICES  UNIVERSITY 

■tih  of the  Health  Sciences 

Nr 


Participant  ID: 


P_tbilinjury] 

[Itbilinjurytext] 


p Jorceltext] 


Brief  description  of  the  MOST  RECENT  incident: 


Was  any  part  of  your  body  injured  as  a  result  of  this  incident,  other  than  your  head? 
9999)  N/A  0)  No  1)  Yes  Describe: _ 


Was  this  incident  during  a  military  deployment? 

1)  Before  any  deployment  4)  Between  deployments 

2)  During  deployment  9999)  N/A 

3)  After  deployment 


p_octivityl] 

What  was  the  cause  of  the  incident?  (choose  only  one  closest  answer,  or  “other'1) 

1)  Motor  vehicle  crash 

6)  Accidental  fall 

2)  Sports/Recreation 

7)  On  foot  during  blast 

3)  Assault  (person  on  person) 

8)  Other  (specify) 

P_activityl_text] 

4)  Hit  by  (moving)  projectile 

5)  In  vehicle  during  blast 

9999)  N/A  (no  TBI  reported) 

How  was  this  head  injury  acquired? 

Obj  hit  head 

Head  hit  object 

Blast  injury 

Projectile 

Other 

[1 Jorcelohh] 

PJorcelhho] 

[1 JorcelbtaslJ 

PJorcelproj] 

P Jorcelother] 

0)  No 

0)  No 

0)  No 

0)  No 

0)  No 

1)  Yes 

1)  Yes 

1)  Yes 

1)  Yes 

1)  Yes 

9999) N/A 

9999)  N/A 

9999)  N/A 

9999)  N/A 

9999) N/A 

If  other,  then  specify: 


Ettenhofer  Neurocognitive  Laboratory  MLE.ET.lnt.06.15.12.P3 

USUHS-MPS  4 


178 


I  HIM  UNIFORMED  SERVICES  UNIVERSITY 

l^fai  h  i  it  t tht  Htaltb  Sciences 


Participant  ID: 


[l_helmet_tbilj 

Were  you  wearing  a  helmet  at  the  time  of  the  incident? 

0)  No  1)  Yes  9999)  N/A  (No  TBI  reported) 

PJoctimel] 

BE  AS  ACCURATE 

AS  POSSIBLE 

How  long  were  you  unconscious?  (according  to  incident  report,  if  available) 

DO  THE  MATH! 

Minutes  (Hours)  (Days)  CONVERT  TO  MINS! 

7777)  Unknown  (due  to  PTA)  8888)  Yes  LOC,  but  insufficient  info  9999)  N/A  (No  TBI) 

[IJocminl] 

LOC  Minimum  -  troubleshoot  as  needed 

(e.g.  if  injured  in  Iraq  and  woke  up  in  Germany,  implies  LOC>6  hours) 

DO  THE  MATH! 

Minutes  (Hours)  (Days)  CONVERT  TO  MINS! 

9999)  N/A  (No  TBI) 

PJocmaxl  1 

LOC  Maximum  -  troubleshoot  as  needed 

DO  THE  MATH! 

Minutes  (Hours)  (Days)  CONVERT  TO  MINS! 

9999)  N/A  (No  TBI) 

[1 _ptatimel] 

BE  AS  ACCURATE 

AS  POSSIBLE 

Is  any  time  "missing"  from  your  memory  after  the  injury?  If  so,  how  much? 

(use  PTA  according  to  incident  report,  if  available) 

DO  THE  MATH! 

Minutes  (Hours)  (Days)  CONVERT  TO  MINS! 

8888)  Yes  PTA,  but  insufficient  info  9999)  N/A  (No  TBI) 

[l_ptaminl] 

PTA  Minimum  -  troubleshoot  as  needed 

DO  THE  MATH! 

Minutes  (Hours)  (Days)  CONVERT  TO  MINS! 

9999)  N/A  (No  TBI) 

[1 _ptamaxl] 

PTA  Maximum  -  troubleshoot  as  needed  DO  THE  MATH! 

Minutes  (Hours)  (Days)  CONVERT  TO  MINS! 

9999)  N/A  (No  TBI) 

Ettenhofer  Neurocognitive  Laboratory  MLE.ET.lnt.06.15.12.P3 
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I  HIM  UNIFORMED  SERVICES  UNIVERSITY 

h  i  'f . the  Health  Science! 


Participant  ID: 


P_punctl] 

[1  puncTl  textl 

Was  your  skull  punctured?  0)  No  l)Yes  9999)  N/A  (no  TBI  reported) 

If  so,  specify: 

[IJractl] 

[ 1  froctl  textl 

Was  your  skull  fractured?  0)  No  1)  Yes  9999)  N/A  (no  TBI  reported) 

If  so,  specify: 

P_txl] 

Did  you  receive  treatment  or  evaluation  shortly  after  the  time  of  the  injury? 

0)  No,  none  1)  Yes  9999)  N/A  (no  TBI  reported) 

P_txhl 1 

Were  you  hospitalized  for  this  incident?  If  yes,  how  long  was  the  hospitalization? 

Note:  hospitalization  means  inpatient  admit  for  >24  hours 

Days  0)  No  Hospitalization 

8888)  Yes  hospitalization,  but  duration  unknown  9999)  N/A  (no  TBI  reported) 

PJgii] 

Have  you  ever  been  involved  in  litigation  related  to  this  injury 

0)  No  1)  Yes  9999)  N/A  (no  TBI  reported) 

Most  Severe  Traumatic  Brain  Injury  History 


P_date_tbi2] 

When  was  your  most  severe  head  injury?  DO  NOT  CODE  AGAIN  FOR  SAME  INJURY  AS  ABOVE 

Note:  most  severe  according  to  PTA/LOC  (etc.;  follow  up  as  needed) 

/  /  99/99/9999)  N/A 

Brief  description  of  most  severe  head  injury  incident: 

P  tbi2  text] 

P_tbi2injury] 

Was  any  part  of  your  body  injured  as  a  result  of  this  incident,  other  than  your  head? 

9999)  N/A  0)  No  l)Yes  Describe: 

P  tbi2iniurv  textl 
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IHM  UNIFORMED  SERVICES  UNIVERSITY 

hi  of tbt  lltalrb  Sciences 

Participant  ID: 

P_mil2] 

Was  this  incident  during  a  military  deployment? 

1)  Before  any  deployment 

4)  Between  deployments 

2)  During  deployment 

9999)  N/A 

3)  After  deployment 

[l_octivity2] 

What  was  the  cause  of  the  incident?  (choose  only  one  closest  answer,  or  "other") 

1)  Motor  vehicle  crash 

6)  Accidental  fall 

2)  Sports/Recreation 

7)  On  foot  during  blast 

3)  Assault  (person  on  person) 

8)  Other  -  Specify: 

H  activitv2  textl 

4)  Hit  by  (moving)  projectile 

5)  In  vehicle  during  blast 

9999)  N/A  (no  TBI  reported) 

[I _fbrce2_text] 


[I_helmet_tbi2] 


P_loctime2] 

BE  AS  ACCURATE 
AS  POSSIBLE 


How  was  this  head  injury  acquired? 

Obj  hit  head 

Head  hit  object 

Blast  injury 

Penetrated  Skull 

Other 

[l_force2ohh] 

[1 Jorce2hho] 

[l_force2blast] 

[l_force2proj] 

[1 _force2other] 

0)  No 

0)  NO 

0)  No 

0)  NO 

0)  NO 

1)  Yes 

1)  Yes 

1)  Yes 

1)  Yes 

1)  Yes 

9999) 

9999) 

9999) 

9999) 

9999) N/A 

If  other,  then  specify: 


Were  you  wearing  a  helmet  at  the  time  of  the  incident? 
0)  No  1)  Yes 


9999)  N/A  (No  TBI  reported) 


How  long  were  you  unconscious?  (refer  to  incident  report,  if  available)  DO  THE  MATH! 

Minutes 

(Hours)  (Days) 

CONVERT  TO  MINS! 

7777)  Unknown  (due  to  PTA) 

8888)  Yes  LOC,  but  insufficient  info 

9999)  N/A  (No  TBI) 
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[■JM  UNIFORMED  SERVICES  UNIVERSITY 

i  if the  Health  Sciencei 


Participant  ID: 


PJocmin2] 

LOC  Minimum  -  troubleshoot  as  needed  DO  THE  MATH! 

(e.g.  if  injured  in  Iraq  and  woke  up  in  Germany,  implies  LOC>6  hours)  CONVERT  TO  MINS! 

Minutes  (Hours)  (Days) 

9999)  N/A  (No  TBI) 

PJocmax2] 

LOC  Maximum  -  troubleshoot  as  needed  DO  THE  MATH! 

CONVERT  TO  MINS! 

Minutes  (Hours)  (Days) 

9999)  N/A  (No  TBI) 

[1 _ptatime2] 

BE  AS  ACCURATE 

AS  POSSIBLE 

Is  anytime  "missing"  from  your  memory  after  the  injury?  If  so,  DO  THE  MATH! 

how  much?  (use  PTA  according  to  incident  report,  if  available)  CONVERT  TO  MINS! 

Minutes  (Hours)  (Days) 

8888)  Yes  LOC,  but  insufficient  info  9999)  N/A  (No  TBI) 

[1 _ptamin21 

PTA  Minimum  -  troubleshoot  as  needed  DO  THE  MATH! 

CONVERT  TO  MINS! 

Minutes  (Hours)  (Days) 

9999)  N/A  (No  TBI) 

[1 _ptamax2] 

PTA  Maximum  -  troubleshoot  as  needed  DO  THE  MATH! 

Minutes  (Hours)  (Days)  CONVERT  TO  MINS! 

9999)  N/A  (No  TBI) 

P_punct2] 

It  du net 2  text] 

Was  your  skull  punctured?  0)  No  1)  Yes  9999)  N/A  (no  TBI  reported) 

If  so,  specify: 

p_froct2] 

P  froct2  text] 

Was  your  skull  fractured?  0)  No  l)Yes  9999)  N/A  (no  TBI  reported) 

If  so,  specify: 

Ettenhofer  Neurocognitive  Laboratory  MLE.ETJnt.06.15.12.P3 
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I  HIM  UNIFORMED  SERVICES  UNIVERSITY 

hi  of the  Health  Sciences 


Participant  ID: 


P.tx2} 

Did  you  receive  treatment  or  evaluation  shortly  after  the  time  of  the  injury? 

0)  No,  none  1)  Yes  9999)  N/A  (no  TBI  reported) 

P_txh2] 

Were  you  hospitalized  for  this  incident?  If  yes,  how  long  was  the  hospitalization? 

Note:  hospitalization  means  inpatient  admit  for  >24  hours 

Days  0)  No  Hospitalization 

8888)  Yes  hospitalization,  but  duration  unknown  9999)  N/A  (no  TBI  reported) 

PJgi2] 

Have  you  ever  been  involved  in  litigation  related  to  this  injury 

0)  No  1)  Yes  9999)  N/A  (no  TBI  reported) 

Additional  TBI  1 

ncidents 

p_blast_noTBI] 

ESTIMATES  OK 

How  many  times  were  you  exposed  to  a  blast  where  you  felt  the  pressure  wave  on  your  body 
that  DID  NOT  result  in  you  losing  consciousness,  having  any  memory  loss,  or  feeling  confused 
or  disoriented? 

p_blastTBI] 

ESTIMATES  OK 

How  many  times  were  you  exposed  to  a  blast  that  DID  result  in  you  losing  consciousness, 
feeling  confused/disoriented,  or  having  any  memory  loss? 

p_num_aoc] 

ESTIMATES  OK 

P  numooc  text / 

Have  you  ever  been  confused  or  disoriented  after  an  explosion  or  a  blow  to  the  head,  but 
did  NOT  lose  consciousness  or  experience  memory  loss?  If  so,  how  many  times  has  this 
occurred  throughout  the  course  of  your  life? 

(Note:  this  is  the  number  of  times  experiencing  AOC  without  LOC  or  PTA) 

If  yes,  specify: 

Other  Medical 

History 

p_binhcom] 

P  birthcom  textl 

Were  there  any  medical  complications  at  the  time  of  your  birth? 

0)  No  1)  Yes  -  If  so,  did  these  affect  your  health  afterward?  How? 

Specify: 
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HIM  UNIFORMED  SERVICES  UNIVERSITY 

■n  h \:ftbt  lltaltb  Science! 


Participant  ID: 


Ettenhofer  Neurocognitive  Laboratory 
USUHS-MPS 


Do  you  have  a  history  of  any  of  the  following  conditions? 

No  Yes 

P _tumor] 

0  1 

Tumor 

P_thyroid] 

0  1 

Thyroid  Disorder 

p_cereb_infect] 

0  1 

Brain  Infection 

P_headache] 

0  1 

Headache  (pre-TBI/no  TBI) 

P_headocheTBI] 

N/A  0  1 

Headache  (post-TBI  or  circle  N/A  if  no  TBI  reported) 

p_vision _prob] 

0  1 

Vision  Problems 

p_visk>n_textl] 

p_vision_correct] 

0  1 

Specify: 

Vision  Corrected 

p_\rision_text2] 

P_seizure] 

0  1 

Specify: 

Seizures  {pre-TBI/no  TBI) 

P_seizureTB 1] 

N/A 

0  1 

Seizures  (post-TBI  or  circle  N/A  if  no  TBI  reported) 

P_stroke] 

0  1 

Stroke 

p_toxic_exp] 

0  1 

Toxic  Exposure 

p_cvd/_date] 

0  1 

Cardiovascular  Disease  Dx  Date: 

P_hypenens/_dote] 

0  1 

Diagnosis  of  Hypertension  Dx  Date: 

p_heanattack] 

0  1 

Heart  Attack 

p_cad/_date] 

0  1 

Coronary  Artery  Disease  Dx  Date: 

p_diabetesl 

0  1 

Diabetes 

P_smoking/_daily] 

0  1 

Smoking  Cigs/day: 

p_medother_text] 

0  1 

Other  Medical  history? 

MLE.ET.lnt-06.15.12.P3 
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I  HIM  UNIFORMED  SERVICES  UNIVERSITY 

hi  rjf  the  Health  Sciences 


Participant  ID: 


Have  you  ever  been  diagnosed  with: 

No  Yes 

P_depression] 

0  1 

Clinical  Depression 

[l_ptsd] 

0  1 

PTSD 

p_anxiety] 

0  1 

Anxiety  (not  related  to  PTSD) 

p_omaety_text] 

P jsychotk] 

0  1 

Specify: 

Psychotic  Disorders  (e.g.  Schizophrenia) 

p _psych_other] 

P_psychother_text] 

0  1 

Other  -  Specify: 

p _psychtx] 

Are  you  currently  involved  in  a  behavioral  health  program? 

P  psychtx  text 1 

0)  No  1)  Yes  Specify: 

pcogtx] 

Are  you  currently  in  a  cognitive  rehabilitation  program?  (follow  up  if  they  aren't  sure) 

P_cogtx_text] 

0)  No  1)  Yes  Specify: 

Do  you  take  any  medications? 

Medication  Name  and  Purpose 

Dosage 

Frequency 

Hours  Since  Last  Dose 

[medl] 

1) 

[med2] 

2) 

[med3] 

3) 

[med4] 

4) 

[med5] 

5) 

[med6] 

6) 

[med7] 

7) 

[med8] 

8) 

[med9] 

9) 

[ medlO 1 

B. 

Ettenhofer  Neurocognitive  Laboratory 
USUHS-MPS 
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Other  Substances 


Have  you  had  any  of  these  within  the  last  12  hours? 


No  Yes 

P_alcohol] 

0  1 

Alcohol  #  Drinks  Hours  since  use 

[l_nicotine] 

0  1 

Nicotine#  Cigs  Hours  since  use 

P-Caffeine] 

0  1 

Caffeine#  Drinks  Hours  since  use 

Individuals  who  have  had  more  than  J  alcoholic  drinks  within  the  last  12  hours,  or  any  drinks 
within  the  last  4  hours  should  NOT  be  tested  (reschedule  testing) 


GENERAL  NOTES: 


Ettenhofer  Neurocognitive  Laboratory  MLE.ETInt06  15.12.P3 
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Measure  4:  Head  Injury  Knowledge  Scale  Version  B  (HIKS  B) 


Experiences  Following  Brain  Injury:  Version  B 

Having  a  brain  injury  can  lead  to  a  range  of  changes  in  a  person's  everyday  abilities.  We  are 
interested  in  what  changes  you  believe  are  likely  to  occur  after  someone  has  had  a  brain 
injury. 

For  each  item,  please  circle  TRUE  if  you  think  this  would  occur  often  or  most  of  the  time  or 
FALSE  if  you  think  this  would  rarely  or  never  occur  following  a  brain  injury. 


Often  or 
most  of 
the  time 

Rarely 

never 

Have  difficulty  recognising  faces  of  family  members 

True 

False 

Have  trouble  concentrating  on  more  than  one  task  at  a  time 

True 

False 

Less  movement  or  coordination  down  one  side  of  the  body 

True 

False 

Have  trouble  remembering  major  events  from  childhood 

True 

False 

Think  and  act  as  if  you  were  a  different  person  altogether 

True 

False 

See  things  or  images  that  are  not  really  there 

True 

False 

Have  trouble  finding  the  right  words  in  conversation 

True 

False 

Experience  a  reduced  or  loss  of  sense  of  smell 

True 

False 

Have  trouble  saying  the  letters  of  the  alphabet 

True 

False 

Become  frustrated  more  easily 

True 

False 

Forget  your  own  first  name  or  surname 

True 

False 

Feel  tired  more  easily 

True 

False 

Have  trouble  remembering  howto  get  dressed 

True 

False 

Say  things  without  thinking  them  through  first 

True 

False 

Feel  pins  and  needles  in  the  face  and  both  arms 

True 

False 

Plan  to  do  things  in  the  future  but  do  not  follow  them  through 

True 

False 

Become  upset  and  yell  for  no  reason 

True 

False 

Have  trouble  remembering  details  of  recent  conversations 

True 

False 

Copyright:  Ownsworth,  Ono  &  Waiters  2009.  Please  do  not  use  or  distribute  without  the  authors'  prior 
consent 
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Administrative  Document  1:  Simulator  Study  Tnfonned  Consent  Form 


Uniformed  services  university 

OF  THE  HEALTH  SCIENCES 


INFORMED  CONSENT  FORM 
RESEARCH  STUDY 


Eve  Tracking  Indicators  of  Neurocognitive  Status  after  Traumatic  Brain  Injury  -  Phase  4 

INTRODUCTION 


You  are  being  asked  to  take  part  in  a  research  study.  Before  you  decide  if  you  want  to  be  in  the 
study,  youneedto  understand  its  risks  andbenefits  so  thatyou  can  make  aninfoimed  decision. 
This  is  known  as  informed  consent 


This  consent  form  provides  information  about  the  research  stuck  which  has  been  explainedto 
you.  Once  you  understand  what  it  involves,  you  will  be  asked  to  tell  the  researcher  if  you  want  to 
take  part  init.  Your  decision  to  take  part  in  the  study  is  entirely  voluntary.  This  means  that  you 
are  freeto  choose  whether  or  not  you  want  to  be  a  research  subject. 

DESCRIPTION  OF  THE  RESEARCH  AND  ITS  PURPOSE 


The  purpose  ofthis  study  is  to  develop  andtestaneye-trackingtoolto  accurately  diagnose 
traumatic  brain  injury.  This  tool  worics  by  watchingvour  eyes  asyou  complete  tasks  on  a 
computer. 

In  this  phase  ofthe  experiment,  you  willbe  randomly  assignedto  one  oftwo  groups,  each  with  a 
different  set  of  instructions  forhowto  complete  a  series  of  tasks.  You  will  be  asked  to  follow 
the  instructions  to  the  best  of  your  ability.  All  participants  will  complete  computertasks  while 
your  eye  movements  are  tracked.  You  will  also  complete  a  number  of  other  thinking  tasks. 

This  study  is  being  conducted  using  funds  from  the  Uniformed  Services  University  ofthe  Health 
Sciences  (USUHS). 

The  principalinvestigator  forthis  study  is:  Mark  L.  Ettenhofer.  Ph  D. 

Department  of  Medical  and  Clinical  Psychology 
Uniformed  Services  University  of  the  Health  Sciences 
4301  Jones  Bridge  Road 
Bethesda.  MD  20814-4712 
301-295-3279 

Eligibility  to  Participate: 

You  are  being  asked  to  be  in  this  study  because  youwere  previously  determined  to  be  eligible 
duringthe  pre-screeningprocedure.  Duringthe  pre-screeningprocedure,  it  was  determinedthat 
you  are  overthe  age  of  1 8,  andyou  do  nothave  a  medical  conditionthat  wouldbe  expectedto 
affecty  our  eye  or  brain  functioning  or  your  use  o  f  your  hands. 
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If  you  are  Active  Duty  Military  or  a  civilian  federal  employee,  it  is  also  re  quire  d  that  you  provide 
a  signed  Statement  of  Approval  for  Participation  in  Research  Active  Duty  Military  personnel 
must  have  this  approval  signedby  their  supervisor  and  the  Brigade  Commander;  Federal 
Civilians  must  have  this  signed  by  their  Supervisor  before  anyresearchparticipation 

STUDY  PROCEDURE: 

Your  participation  in  this  study  will  require  1  study  visit  that  will  last  about  1.5  hours. 

Ifyou  agree  to  participate,  you  will  sign  this  consent  form  after  it  has  been  explained  to  youand 
before  any  stu<k-relatedprocedures  take  place. 

We  will  collect  personal  informationaboutyou(yourname,  address,  phone  number,  and  email 
address)  so  that  we  may  compensateyouforparticipationin  this  study.  You  will  be  randomly 
assignedto  one  oftwo  groups,  each  with  a  different  set  of  instructions  forhowto  complete  a 
series  of  tasks.  You  will  be  askedto  followthe  instructions  to  the  best  ofyour  ability.  You  will 
complete  tasks,  procedures,  and  questionnaires  during  this  time  to  measure  your  thinking  (e.g., 
attention  reaction  time,  memory).  You  may  refuse  to  answer  any  questions  that  make  you  feel 
uncomfortable,  or  withdraw  from  the  stuck'  at  anytime.  You  will  also  complete  a  series  of 
computertasks,  about  45  minutes  in  duration,  during  which  your  eyes  will  be  trackedby  a 
camera.  The computerwillrecordyoureyemovementswhileyoucompletethetasks. 

POSSIBLE  BENEFITS 

Theinformationresearchersget  from  this  study  may  help  others  in  the  future.  Youmightnot 
personally  benefit  frombeinginthis  study. 

COMPENSATION 

Ifyou  are  active  duty  military  or  a  federal  employee,  youare  not  eligible  for  compensation 
Otherwise,  you  will  b  e  comp  ensated  S3  0  for  your  p  articip  ation,  p  aid  by  check  after  your 
participation  visitis  complete.  The  check  will  be  mailedifrequested. 

POSSIBLE  RISKS 

There  are  no  known  or  expected  risks  for  participating  in  this  study,  but  you  could  have  side 
effects  that  we  do  not  expect  or  knowto  watch  fornow.  Call  the  principal  investigator  ifyou 
have  any  symptoms  or  problems. 

There  is  a  risk  that  one  ormore  of  these  questions  ortasks  might  make  youupset  or 
uncomfortable.  If  this  happens,  rememberthatyou  willnotneedto  respond  to  any  questions, 
complete  any  tasks,  or  follow  any  instructions  that  make  you  feel  upset  oruncomfartable.  You 
may  also  discontinue  participation  at  ary  time  without  penalty. 

Referrals 

USUHS  Grait  “R072LP-SS3  Participants  Initials _ Date _ 
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Ifwefeelit  is  needed  oryourequest  it.  we  will  provideyou  with  referrals  to  a  mental  health  care 
provider  for  evaluation  or  treatment  at  your  option  andyour  expense.  These  referrals  maybe 
provided  up  to  one  week  from  your  visit  if  the  princip  al  investigator  judges  thaty  ou  may  b  enefit 
from  these  servicesbaseduponevidence  ofmentalhealth  difficulties.  However,  this  study  is 
not  intendedto  diagnose  ortreat  any  conditions.  Non-referral  doesnot  imply  the  absence  of  a 
mentalhealth  condition 

RIGHT  TO  WITHDRAW  FROM  THE  STUDY 

Youmay  decide  to  stop  takingpart  in  this  study  at  any  time.  This  will  not  affect  your 
relationship  with  USUHS  in  any  way.  You  can  agree  to  be  in  the  study  now  and  change  your 
mind  later.  If  you  decide  to  withdraw,  you  may  do  so  in  p  erson  or  over  the  phone  by  calling  Dr. 
Ettenho  fer  at  3  0 1-295-32  79 .  However,  phone  c  alls  and  any  formal  withdrawals  must  be 
accompanied  by  a  written,  signedrequest  includingyour  fullprintedname,  and  sent  to  Dr. 
Ettenho  fer  at  the  address  listed  above.  Yourparticipationmay  also  be  discontinued  by  study 
personnel  forreasons  including,  but  not  limited  to,  yourpotential  difficulty  following  study 
procedures.  If  requested,  we  will  also  destroy  any  information  we  have  collectedabout  you 

PRIVACY  AND  CONFIDENTLALITY 

All  inf ormationyou  provide  as  part  of  this  study  will  be  confidential  andwillbe  protected  to  the 
fullest  extent  provided  by  law.  Yourrecords  related  to  this  study  will  be  accessible  to  those 
persons  directly  involvedin  conducting  this  study  andmembers  of  the  Uniformed  Services 
University  o  f  the  Health  Sciences  Institutional  Review  Bo  ard  (IRB),  which  provides  oversight 
forprotection  of  humanresearch  volunteers.  In  addition,  the  Institutional  Review  Bo  ard  at 
USUHS  and  other  federal  agencies  who  help  protect  people  who  are  involved  in  research  studies, 
may  needto  see  the  informationyougive  us.  Other  than  those  groups,  records  from  this  study 
will  be  kept  private  to  the  fullest  extent  of  the  law.  Scientific  reports  that  come  out  ofthis  study 
may  use  the  informationyouhave  provided  but  these  reports  will  not  use  your  name  oridentify 
you  in  any  way. 

Records  ofyourparticipationin  this  study  may  only  be  disclosedin  accordance  with  federal  law, 
includingthe  Federal  Privacy  Act,  5  U.S.C  ,552a,  and  its  implementing  regulations. 
Confidentiality  ofyourrecords  will  be  protectedto  the  extent  possible  under  existing  regulations 
and  laws  but  cannot  be  guaranteed  Complete  confidentiality  cannotbe  promised,  particularly  for 
military  personnel,  because  information  bearing  onyour  health  maybe  requiredto  be  reportedto 
appropriate  medical  or  commandauthorities. 

Personal  contact  information  may  be  retained  forthe  purposes  of  completing  this  study  andto 
notify  you  of  future  studies  and  assess  yourinterest  in  participation  You  will  only  be  contacted 
regardingvour  current  participation  and  future  studies.  Optionally,  you  may  choose  to  not  be 
contactedfor  future  studie s  by  notifying  study  p ersormel  o f  v our  de cision. 
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RECOURSE  IN  THE  EVENT  OF  INJURY 

This  study  shouldnot  entail  any  physical  or  mental  risk  beyond  those  described  above.  We  do 
not  expect  complications  to  occur,  but  if,  for  any  reason,  you  feel  that  continuing  this  study 
would  constitute  a  hardship  foryouwe  will  end  yourparticipationin  the  study. 

In  the  event  o  f  a  me  die  al  emergency  while  p  articip  ating  in  this  study  or  medic  al  treatment 
required  as  a  result  ofyourparticipationinthis  study,  youmavreceive  emergency  treatment  in 
the  facility  you  are  in  or  a  nearby  Department  of  Defense  (military)  medical  facility  (hospital  or 
clinic).  Treatment  care  will  be  provided  evenifyou  are  not  eligible  to  receive  such  care.  Care 
will  be  continued  until  the  medical  doctortreatingyou  decides  thatyou  are  out  of  immediate 
danger.  If  you  are  not  entitled  to  care  in  a  military  facility,  youmay  be  transferredto  a  private 
Chilian  hospital.  The  attending  doctor  ormemberofthehospitalstaffwill  go  overthe  transfer 
decision  with  you  before  it  happens.  The  military  will  bill  your  health  insurance  for  health  care 
voureceive  which  is  not  part  ofthe  study.  You  will  not  be  personally  billed  and  you  WILL  NOT 
be  expectedto  pay  formedical  care  at  our  hospitals.  Ifyou  are  requiredto  pay  a  deductible  you 
may  make  a  claim  for  reimbursement  through  the  Uniformed  Services  University  Office  of 
General  Counsel.  In  case  youneed  additional  care  following  discharge  from  the  military- 
hospital  or  clinic,  a  military  health  care  professional  will  decide  whether  your  need  for  care  is 
directly  related  to  being  in  the  study.  Ifyourneed  for  care  is  relatedto  the  study,  the  military 
may  offervoulimitedhealth  care  at  its  medical  facilities.  This  additional  care  is  not  automatic. 

If  at  anytime  youbelieve  youhave  suffered  aninjury  orillness  as  a  result  ofparticipatinginthis 
research  project,  you  should  contactthe  Office  of  Researchat  the  UniformedServices  University 
ofthe  Health  Sciences.  Bethesda,Marvland20S  14-4799  at  (301)295-3303.  This  office  can 
review'the  matter  with  you,  can  provide  information  about  vourrights  as  a  subject,  andmay  be 
able  to  identify-  resources  available  to  you.  If  youbelieve  the  government  or  one  ofthe 
government's  employees  (such  as  a  military  doctor) has  injuredyou.  a  claim  for  damages 
(money)  against  the  federal  government  (including  the  military)may  be  filedunderthe  Federal 
Torts  Claims  Act.  Information  aboutjudicial  avenues  of  compensationis  available  from  the 
University's  General  Counsel  at  (301)295-3028. 
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IF  YOU  HAVE  QUESTIONS  OR  CONCERNS 


Ifyouhave  questions  about  this  research,  you  should  contact  Dr.  Mark  Ettenhofer,  the  personin 
charge  o  f  the  study .  His  phone  number  at  USUHS  is  3 0 1 -2 9 5 -3 2 79.  Ev en  in  the  e v ening  or  on 
weekends,  you  can  leave  a  message  at  thatnumber.  Ifyouhave  questions  abouty  our  rights  as  a 
research  subj  ect  you  should  call  the  Director.  Human  Research  Protections  Program  at  USUHS 
at  (30 1)  295-9534.  He  isyour  representative  and  has  no  connection  to  the  researcher  conducting 
this  study. 

By  signing  this  form  you  are  agreeingthat  this  study  has  been  explainedto  you,  that  you 
understoodthat  explanation,  andthatyouwantto  takepart  in  this  research. 


Subject 


Date  of  signature 


Witness 


Date  of  signature 


I  certify  that  the  research  study  has  been  explained  to  the  above  individuals,  by  me  ormv 
research  staff  and  that  the  individual  understands  the  nature  andpurpose,  the  possible  risks  and 
benefits  associated  with  takingpart  in  this  research  study.  Any  questions  that  have  been  raised 
have  been  answered 


Investigator 


Date  of  signature 
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Administrative  Document  2:  Parent  Study  TBI  Cohort  Informed  Consent  Form 


RESEARCH  STUDY 

Eve  Tracking  Indicators  of  Neurocottnitive  Status  after  Traumatic  Brain  Injury  -  Phase  3 

INTRODUCTION 

You  are  being  asked  to  take  part  in  a  research  study.  Before  you  decide  if  you  want  to  be  in  the 
study,  you  need  to  understand  its  risks  and  benefits  so  that  you  can  make  an  informed  decision. 
This  is  known  as  informed  consent. 


This  consent  form  provides  information  about  the  research  study  which  has  been  explained  to 
you.  Once  you  understand  what  it  involves,  you  will  be  asked  to  tell  the  researcher  if  you  want  to 
take  part  in  it.  Your  decision  to  take  part  in  the  study  is  entirely  voluntary.  This  means  that  you 
are  free  to  choose  whether  or  not  you  want  to  be  a  research  subject. 

DESCRIPTION  OF  THF.  RESEARCH  AND  ITS  PURPOSE 


The  purpose  of  this  study  is  to  develop  and  test  an  eye-tracking  tool  to  accurately  diagnose 
traumatic  brain  injury.  This  tool  works  by  watching  your  eyes  as  you  complete  tasks  on  a 
computer. 

In  this  phase  of  the  experiment,  you  will  complete  computer  tasks  while  your  eye  movements  are 
tracked.  You  will  also  complete  a  number  of  other  thinking  tasks  and  provide  information  about 
your  psychological  functioning.  We  will  compare  your  results  to  the  results  of  other  people 
without  concussions  or  brain  injuries  in  order  to  determine  which  tests  tend  to  show  different 
results  between  groups. 

This  study  is  being  conducted  using  funds  from  the  Uniformed  Services  University  of  the  Health 
Sciences  (USUHS). 


The  principal  investigator  for  this  study  is:  Mark  L.  Ettenhofer,  Ph.D. 

Department  of  Medical  and  Clinical  Psychology 
Uniformed  Services  University  of  the  Health  Sciences 
4301  Jones  Bridge  Road 
Bethesda,  MD  20814-4712 
301-295-3279 

Eligibility  to  Participate: 

You  are  being  asked  to  be  in  this  study  because  you  were  previously  determined  to  be  eligible 
during  the  pre-screening  procedure.  During  the  pre-screening  procedure,  it  was  determined  that 
you  are  over  the  age  of  1 8,  you  have  a  history  of  concussion  or  brain  injury,  and  you  do  not  have 


as  O 
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any  other  medical  condition  that  would  be  expected  to  affect  your  eye  or  brain  functioning  or 
your  use  of  your  hands. 


If  you  are  Active  Duty  Military  or  a  civilian  federal  employee  it  is  also  required  that  you  provide 
a  signed  Statement  of  Approval  for  Participation  in  Research.  Active  Duty  Military'  personnel 
must  have  this  approval  signed  by  their  supervisor  and  the  Brigade  Commander;  Federal 
Civilians  must  have  this  signed  by  their  Supervisor  before  any  research  participation. 

STUDY  PROCEDURE: 


Your  participation  in  this  study  will  require  about  2.5  hours  during  a  single  visit.  We  will  also 
call  you  6  and  12  months  from  now  to  conduct  follow-up  interviews  of  about  10  minutes  each. 

Before  you  decide  whether  or  not  you  agree  to  participate,  your  ability  to  provide  informed 
consent  will  be  evaluated.  If  you  are  determined  not  to  have  the  ability  to  provide  informed 
consent,  you  will  not  be  permitted  to  participate  in  the  study.  If  you  agree  to  participate,  you  will 
sign  this  consent  form  after  it  has  been  explained  to  you  and  before  any  study  related  procedures 
take  place.  All  Active  Duty  and  Veteran  Military  participants  will  also  be  asked  if  they  are 
willing  to  provide  written  permission  for  separate  release  of  information  (ROI)  of  prior  Armed 
Services  Vocational  Aptitude  Battery  (ASVAB)  test  results  (as  available)  for  inclusion  in  the 
study  database.  This  information  will  allow  study  personnel  to  make  comparisons  between  your 
performance  on  that  previous  test  and  the  tests  you  will  complete  during  this  study.  Declining  to 

provide  permission  for  release  of  ASVAB  results  will  not  result  in  exclusion  from  the  study  or 
affect  other  aspects  of  study  participation. 

Visit  1 :  (2.5  hours) 

We  will  collect  personal  information  about  you  (your  name,  address,  phone  number,  and  the 
name  and  phone  number  of  two  people  you  know)  so  that  we  are  able  to  contact  you  to  complete 
your  telephone  follow-up.  We  will  review  your  medical  history.  You  will  complete  tasks, 
procedures,  and  questionnaires  during  this  time  to  measure  your  thinking  (e.g.,  attention,  reaction 
time,  memory)  and  your  psychological  functioning.  You  may  refuse  to  answer  any  questions 
that  make  you  feel  uncomfortable.  You  will  also  complete  a  series  of  computer  tasks,  about  30 
minutes  in  duration,  during  which  your  eyes  will  be  tracked  by  a  camera.  The  computer  will 
record  your  eye  movements  while  you  complete  the  tasks. 


Telephone  Follow  Ups:  ( 1 0  minutes) 

This  study  will  also  include  a  follow  up  interview  with  you  at  6  and  12  months  after  the  initial 
visit.  You  will  be  asked  to  complete  a  small  survey  over  the  phone  that  should  take 
approximately  10-15  minutes.  We  will  ask  about  your  health  and  your  general  life  functioning. 

POSSIBLE  BENEFITS 

The  information  researchers  get  from  this  study  may  help  others  in  the  future.  You  might  not 
personally  benefit  from  being  in  this  study. 
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COMPENSATION 

If  you  are  active  duty  military  or  a  federal  employee,  you  are  not  eligible  for  compensation. 
Otherwise,  you  will  be  compensated  $40  for  your  participation,  paid  by  check  after  your  first 
visit  (e.g.,  prior  to  telephone  follow-ups). 

POSSIBLE  RISKS 

There  are  no  known  or  expected  risks  for  participating  in  this  study,  but  you  could  have  side 
effects  that  we  do  not  expect  or  know  to  watch  for  now.  Call  the  principal  investigator  if  you 
have  any  symptoms  or  problems. 

There  is  a  risk  that  one  or  more  of  these  questions  or  tasks  might  make  you  upset  or 
uncomfortable.  If  this  happens,  remember  that  you  will  not  need  to  respond  to  any  questions  or 
complete  any  tasks  that  make  you  feel  upset  or  uncomfortable.  You  may  also  discontinue 
participation  at  any  time  without  penalty. 

Referrals 

If  we  feel  it  is  needed  or  you  request  it,  we  will  provide  you  with  referrals  to  a  mental  health  care 
provider  for  evaluation  or  treatment  at  your  option  and  your  expense.  These  referrals  may  be 
provided  up  to  one  week  from  your  visit  if  the  principal  investigator  judges  that  you  may  benefit 
from  these  services  based  upon  evidence  of  mental  health  difficulties.  However,  this  study  is 
not  intended  to  diagnose  or  treat  any  conditions.  Non-referral  does  not  imply  the  absence  of  a 
mental  health  condition. 

RIGHT  TO  WITHDRAW  FROM  THE  STUDY 

You  may  decide  to  stop  taking  part  in  this  study  at  any  time.  This  will  not  affect  your 
relationship  with  USUHS  in  any  way.  You  can  agree  to  be  in  the  study  now  and  change  your 
mind  later.  If  you  decide  to  withdraw,  you  may  do  so  in  person  or  over  the  phone  by  calling  Dr. 
Ettenhofer  at  301-295-3279.  However,  phone  calls  and  any  formal  withdrawals  must  be 
accompanied  by  a  written,  signed  request  including  your  full  printed  name,  and  sent  to  Dr. 
Ettenhofer  at  the  address  listed  above.  Your  participation  may  also  be  discontinued  by  study 
personnel  for  reasons  including,  but  not  limited  to,  your  potential  difficulty'  following  study 
procedures.  If  requested,  we  will  also  destroy  any  information  we  have  collected  about  you. 

PRIV  ACY  AND  CONFIDENTIALITY 

All  information  you  provide  as  part  of  this  study  will  be  confidential  and  will  be  protected  to  the 
fullest  extent  provided  by  law.  Your  records  related  to  this  study  will  be  accessible  to  those 
persons  directly  involved  in  conducting  this  study  and  members  of  the  Uniformed  Services 
University  of  the  Health  Sciences  Institutional  Review  Board  (IRB),  which  provides  oversight 
for  protection  of  human  research  volunteers.  In  addition,  the  Institutional  Review  Board  at 
USUHS  and  other  federal  agencies  who  help  protect  people  who  are  involved  in  research  studies, 
may  need  to  see  the  information  you  give  us.  Other  than  those  groups,  records  from  this  study 
will  be  kept  private  to  the  fullest  extent  of  the  law.  Scientific  reports  that  come  out  of  this  study 
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may  use  the  information  you  have  provided,  but  these  reports  will  not  use  your  name  or  identify 
you  in  any  way. 

Records  of  your  participation  in  this  study  may  only  be  disclosed  in  accordance  with  federal  law, 
including  the  Federal  Privacy  Act,  5  U.S.C.552a,  and  its  implementing  regulations. 
Confidentiality  of  your  records  will  be  protected  to  the  extent  possible  under  existing  regulations 
and  laws  but  cannot  be  guaranteed.  Complete  confidentiality  cannot  be  promised,  particularly  for 
military  personnel,  because  information  bearing  on  your  health  may  be  required  to  be  reported  to 
appropriate  medical  or  command  authorities. 

Personal  contact  information  may  be  retained  for  the  purposes  of  completing  this  study  and  to 
notify  you  of  future  studies  and  assess  your  interest  in  participation.  You  will  only  be  contacted 
regarding  your  current  participation  and  future  studies.  Optionally,  you  may  choose  to  not  be 
contacted  for  future  studies  by  notifying  study  personnel  of  your  decision. 

RECOURSE  IN  THE  EVENT  OF  INJURY 

This  study  should  not  entail  any  physical  or  mental  risk  beyond  those  described  above.  We  do 
not  expect  complications  to  occur,  but  if,  for  any  reason,  you  feel  that  continuing  this  study 
would  constitute  a  hardship  for  you,  we  will  end  your  participation  in  the  study. 

In  the  event  of  a  medical  emergency  while  participating  in  this  study  or  medical  treatment 
required  as  a  result  of  your  participation  in  this  study,  you  may  receive  emergency  treatment  in 
the  facility  you  are  in  or  a  nearby  Department  of  Defense  (military)  medical  facility  (hospital  or 
clinic).  Treatment/care  will  be  provided  even  if  you  are  not  eligible  to  receive  such  care.  Care 
w  ill  be  continued  until  the  medical  doctor  treating  you  decides  that  you  are  out  of  immediate 
danger.  If  you  are  not  entitled  to  care  in  a  military  facility,  you  may  be  transferred  to  a  private 
civilian  hospital.  The  attending  doctor  or  member  of  the  hospital  staff  will  go  over  the  transfer 
decision  with  you  before  it  happens.  The  military'  will  bill  your  health  insurance  for  health  care 
you  receive  which  is  not  part  of  the  study.  You  will  not  be  personally  billed  and  you  WILL  NOT 
be  expected  to  pay  for  medical  care  at  our  hospitals.  If  you  are  required  to  pay  a  deductible  you 
may  make  a  claim  for  reimbursement  through  the  Uniformed  Services  University  Office  of 
General  Counsel.  In  case  you  need  additional  care  following  discharge  from  the  military 
hospital  or  clinic,  a  military  health  care  professional  will  decide  whether  your  need  for  care  is 
directly  related  to  being  in  the  study.  If  your  need  for  care  is  related  to  the  study,  the  military 
may  offer  you  limited  health  care  at  its  medical  facilities.  This  additional  care  is  not  automatic. 

If  at  any  time  you  believe  you  have  suffered  an  injury  or  illness  as  a  result  of  participating  in  this 
research  project,  you  should  contact  the  Office  of  Research  at  the  Uniformed  Services  University 
of  the  Health  Sciences,  Bethesda,  Maryland  20814-4799  at  (301)  295-3303.  This  office  can 
review  the  matter  with  you,  can  provide  information  about  your  rights  as  a  subject,  and  may  be 
able  to  identify  resources  available  to  you.  If  you  believe  the  government  or  one  of  the 
government's  employees  (such  as  a  military  doctor)  has  injured  you,  a  claim  for  damages 
(money)  against  the  federal  government  (including  the  military)  may  be  filed  under  the  Federal 
Torts  Claims  Act.  Information  about  judicial  avenues  of  compensation  is  available  from  the 
University's  General  Counsel  at  (301)  295-3028. 
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IF  YOU  HAVE  QUESTIONS  OR  CONCERNS 

If  you  have  questions  about  this  research,  you  should  contact  Dr,  Mark  Ettenhofer,  the  person  in 
charge  of  the  study.  His  phone  number  at  USUHS  is  301-295-3279.  Even  in  the  evening  or  on 
weekends,  you  can  leave  a  message  at  that  number.  If  you  have  questions  about  your  rights  as  a 
research  subject,  you  should  call  the  Director,  Human  Research  Protections  Program  at  USUHS 
at  (301)  295-9534.  He  is  your  representative  and  has  no  connection  to  the  researcher  conducting 
this  study. 


By  signing  this  form  you  are  agreeing  that  this  study  has  been  explained  to  you,  that  you 
understood  that  explanation,  and  that  you  want  to  take  part  in  this  research. 


Subject 


Date  of  signature 


Witness 


Date  of  signature 


I  certify  that  the  research  study  has  been  explained  to  the  above  individuals,  by  me  or  my 
research  staff,  and  that  the  individual  understands  the  nature  and  purpose,  the  possible  risks  and 
benefits  associated  with  talcing  part  in  this  research  study.  Any  questions  that  have  been  raised 
have  been  answered. 


Investigator 


Date  of  signature 
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Administrative  Document  3:  Simulator  Study  Recruitment  Flyer 


Healthy  Volunteers 
Needed 

FOR  PARTICIPATION  IN  EYE  TRACKING  RESEARCH  STUDY 


Eligible  volunteers  will  complete  computer  tasks  while  an  eye  tracker  records  their  eye  movements. 
Volunteers  will  also  take  a  series  of  attention,  reaction  time,  memory  tasks. 

This  research  is  being  conducted  at  the 

Uniformed  Services  University  of  the  Health  Sciences  (USUHS)  in  Bethesda,  MD 

For  more  information,  please  contact: 

Mark  Ettenhofer,  Ph.D.,  at 

eyetracking@usuhs.edu  ^ 

301-295-3279 


•  j 

vO 


<3 

a 


61 

c 

6T 


<3 

O 


S 

5 

c 

7 

n 

Cl. 

C 


>3 

a 

*  t 

r  £■ 

to  5 
VO 

V  ® 

K) 

*  g- 
b 

Cl 


4? 


8 

E 

| 

® 

% 

c 

B* 


•r; 


® 

S 

c 

7 

b 

c_ 

c 


® 


a 

i- 


c& 

m 

| 

u>  2 
O  o 


s©  *5° 

V*  (§) 

u>  c 
to 

25  I 

b 

c 


5 

- 


a 

Qi 

C 


2 

ui  3 

2  & 

&  (§ 
V  (§) 
Id  5 


§ 

3 

o 

g 

do 

6 


r: 

c- 

c 


198 


Administrative  Document  4:  Parent  Study  TBI  Cohort  Recruitment  Flyer 


HAVE  YOU  EVER  HAH  A  CONCUSSION 
OK  TRAUMATIC  BRAIN  INJURY  (TBI)? 


*4.  * 


Volunteers  with  a  history  of  concussion  or  brain  injury  are  needed  to  test  a  noninvasive.  computerized 
eye  tracking  method  for  measuring  brain  functions. 

Total  participation  time  is  about  2.5  hours. 

Eligible  volunteers  will  complete  computer  tasks  while  the  eye  tracker  records  either  eye  movements 
with  a  high-speed  camera,  along  with  a  series  of  other  tests  of  attention,  reaction  time,  memory,  and 

psychological  functions. 

This  research  is  being  conducted  at  the  Uniformed  Services  University  of  the  Health  Sciences  (USUHS) 

in  Bethesda.  MD 

For  more  information,  please  contact:  Mark  Ettenhofer,  Ph.D.  eyetrackingliusuhs.mil  301-295-3279 
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